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1 .  INTRODUCTION 


Detection  of  stochastic  signals  in  Gaussian  or  nonGaussian  noise  is  a  valid 
model  for  many  important  signal  detection  problems.  In  some  of  these  problems, 
the  noise  is  very  nonstationary  and  the  signal  cannot  be  represented  as  a  set  of 
narrowband  components.  Many  examples  can  be  given  of  applications  where  such 
problems  arise;  they  abound  in  such  areas  as  sonar  and  radar.  In  these 
situations,  methods  based  on  spectral  analysis,  assumptions  of  staticnari ty , 
approximations  by  matched  filtering,  etc.  are  all  likely  to  be  unsatisfactory. 

Ideally,  one  would  use  signal  detection  algorithms  based  on  the 
likelihood  ratio.  The  main  difficulty  when  the  noise  is  Gaussian  is  that  the 
finite-dimensional  distributions  of  the  signal-plus-noise  process  are 
typically  unknown.  This  is  especially  true  when  the  signal-plus-noise  process 
is  nonGaussian.  so  that  its  distribution  cannot  be  characterized  by  its 
covariance  and  mean  functions.  This  difficulty  has  led  to  the  use  of  various 
suboptimum  procedures,  typically  based  on  second-moment  criteria. 

However,  for  a  significant  class  of  problems  involving  stochastic  signals 
in  Gaussian  noise,  one  can  give  a  likelihood-ratio-based  detection  algorithm 
which  does  not  require  knowledge  of  the  finite-dimensional  distributions  of 
the  signal-plus-noise  process.  The  development  in  descriptive  fashion  of  such 
an  algorithm  (in  two  versions)  is  the  principal  content  of  this  chapter. 

In  addition  to  discussion  of  signal  detection  for  stochastic  signals  in 
Gaussian  noise,  a  much  shorter  discussion  will  be  given  for  detection  in 
nonGaussian  noise  of  a  Gaussian  mixture  type.  These  results  build  upon  those 
obtained  for  detection  in  Gaussian  noise. 

2.  PRIOR  WORK 

Likelihood- ratio-based  signal  detection  results  for  problems  involving 
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Gaussian  noise  have  largely  involved  a  few  special  cases: 

(1)  A  known  signal,  for  which  the  matched  filter  is  a  likelihood  ratio  test 
statistic  [1],  [2],  and  signals  known  except  for  phase  and  amplitude  [1],* 
[2]. 

(2)  A  Gaussian  signal-plus-noise  process,  for  which  quadratic-plus-linear 
operations  are  optimum  [3],  [4]. 

(3)  A  possibly  nonGaussian  signal  when  the  noise  is  a  Wiener 

(4)  A  possibly  nonGaussian  signal  in  Gaussian  noise  when  signal  and  noise  are 
independent  [6],  [10]. 

Category  (1)  above  is  not  of  primary  interest  here. 

Category  (2)  contains  a  class  of  problems  that  is  often  encountered  in 
applications  such  as  sonar.  The  implementation  of  a  likelihood-ratio  detection 
algorithm  for  discrete-time  data  requires  knowledge  of  only  the  covariance 
matrices  and  mean  vectors  of  the  noise  and  signal-plus-noise  processes.  It  is 
frequently  possible  to  obtain  good  estimates  of  the  noise  covariance  matrix; 
the  noise  is  always  present  in  typical  applications,  and  sufficiently  slowly- 
varying  in  its  statistical  properties  to  enable  a  good  estimate  to  be  made  of 
its  covariance  matrix;  the  mean  is  frequently  zero.  We  shall  assume 
throughout  this  chapter  that  the  noise  covariance  can  be  reliably  obtained  and 
that  the  noise  has  zero  mean.  It  is  quite  a  different  matter  for  the  signal- 
plus-noise  covariance  and  mean.  The  signal  is  frequently  present  for  only 
relatively  short  periods,  and  its  properties  may  be  changing  too  rapidly  to 
enable  one  to  obtain  a  reasonable  estimate  of  its  covariance  and  mean. 

This  problem  of  determining  the  covariance  and  mean  of  the  signal-plus- 
noise  process  limits  the  usefulness  of  the  likelihood  ratio  in  many 
applications  for  which  category  (2)  holds.  However,  even  if  one  is  able  to 
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determine  these  quantities,  the  Gauss-vs . -Gauss  model  is  too  restricted  to 
effectively  model  many  real-world  situations. 

Categories  3  and  4  are  of  most  interest  in  the  present  context.  With 

t 

regard  to  (3),  actual  use  of  a  likelihood-ratio  detection  algorithm  has  been 
limited  by  the  fact  that  conditions  for  its  application  are  rarely  satisfied. 
The  Wiener  process  has  very  unusual  properties,  not  encountered  in 
applications  such  as  sonar  or  radar.  For  example,  a  Wiener  process  has  sample 
paths  that  are  almost  surely  non-dif ferentiable  at  every  time  point  t;  also, 
the  process  is  a  martingale  and  a  Markov  process.  These  properties,  even 
singly,  are  typically  not  found  in  applications.  Algorithms  based  on  the 
assumption  of  Wiener  process  noise  are  thus  very  unlikely  to  perform  in  a 
satisfactory  fashion. 

As  for  (4),  conditions  for  existence  of  a  likelihood  ratio  (nonsingular 
detection)  have  been  given  [6],  [10].  The  results  of  [10]  also  show  how  one 
can  in  principle  determine  the  likelihood  ratio  when  it  exists.  However,  the 
knowledge  required  in  order  to  actually  carry  this  out  will  seldom  be 
available,  comprising  as  it  does  the  finite-dimensional  distributions  of  the 
signal . 

Since  signal-plus-noise  probability  distributions  are  so  difficult  to 
obtain,  suboptimum  procedures  not  requiring  this  knowledge  have  been 
extensively  used.  The  deflection  criterion  gives  a  condition  for  optimality 
which  dates  back  to  at  least  the  early  1940's  [11].  The  criterion  requires 
one  to  specify  a  class  of  admissible  test  statistics,  7.  The  deflection  of  t 
in  y  is  defined  by  D(t)  =  [Eg+NT  -  ENT]2/[Ef(T2  -  (E^t)2],  where  Eg+N(*)  and 
Ejj(*)  denote  expectation  with  respect  to  signal-plus-noise  and  noise, 
respectively.  Deflection  is  used  as  the  measure  of  performance.  For  the  case 
where  the  operations  on  the  data  are  assumed  to  be  quadratic-plus-linear,  and 
where  the  noise  is  Gaussian,  solutions  for  the  optimum  operations  under  this 
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criterion  were  given  in  [12]  and  [13].  In  this  form,  the  solution  requires 
only  knowledge  of  the  mean  and  the  covariance  of  the  signal-plus-noise 
process.  If  the  signal-plus-noise  process  is  also  Gaussian,  then  the 
likelihood  ratio  yields  a  quadra tic-plus-1 inear  operation  as  a  test  statistic. 
However,  this  likelihood-ratio  test  statistic  is  not  the  same  quadratic-plus- 
linear  operation  as  that  obtained  using  the  deflection  criterion,  although 
there  are  some  interesting  relations  between  the  two  [13]. 

It  should  be  noted  that  the  Gaussian  problem  does  not  necessarily  yield  a 
bounded  quadratic- linear  operation  for  the  infinite-dimensional  problem  [3]; 
similarly,  the  supremum  of  the  deflection  over  all  bounded  quadra tic- linear 
operations  may  not  be  achievable  [13]. 

For  a  more  detailed  summary  of  results  mentioned  above,  and  additional 
references,  see  [14]. 

3.  OPTIMUM  DETECTION  OF  STOCHASTIC  SIGNALS  IN  GAUSSIAN  NOISE 
3.1.  Introduction 

Detection  of  stochastic  signals  imbedded  in  Gaussian  or  near-Gaussian 
noise  is  a  common  problem  in  such  areas  as  sonar  or  radar.  The  prevalence 
arises  because  of  physical  properties  of  the  medium  and  the  signal  source;  see 
[15],  [16]  and  the  references  given  there  for  examples  and  discussion  in  sonar 
applications.  The  importance  of  the  problem  has  been  long-recognized: 
solutions  have  lagged  behind.  Of  course,  one  model  for  which  detection 
algorithms  are  well-known  is  that  of  a  signal  known  except  for  phase  and 
amplitude.  This  is  not,  however,  a  sufficiently  rich  class  of  signal 
processes  to  adequately  model  many  of  the  important  sonar-radar  problems. 

Ideally,  one  would  have  a  detection  algorithm  having  the  following 
properties: 
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(1)  Likelihood-Ratio-Based.  Monotone  functions  of  the  likelihood  ratio  are 
optimum  test  statistics  under  a  variety  of  criteria  [1]. 

(2)  Information-Preserving.  Test  statistics  based  on  independent  sampling  of 
the  data  will  in  general  destroy  information  contained  in  the  continuous¬ 
time  waveform.  A  continuous- time  algorithm  should  be  obtained,  then 
approximated  as  closely  as  possible  by  a  discrete-time  algorithm  for 
digital  implementation. 

(3)  Implemen table.  An  abstract  expression  for  the  likelihood  ratio,  based  on 
mathematical  quantities  that  cannot  be  obtained  in  applications,  is 
useless  as  a  detection  algorithm.  The  same  comment  applies  to  test 
statistics  whose  implementation  depends  upon  an  unrealistically-detailed 
knowledge  of  the  physical  environment. 

(4)  Adaptive.  The  algorithm  should  have  the  capability  of  adapting  to 
changes  in  the  environment  or  the  signal  source. 

Two  algorithms  will  now  be  described.  One  meets  all  four  of  the  criteria 
mentioned  above.  The  second  is  not  adaptive,  and  requires  more  prior 
information  than  the  first,  but  is  more  powerful  when  it  can  be  implemented. 

The  development  will  follow  the  path  along  which  the  algorithms  were 
originally  obtained.  A  fixed  Gaussian  noise  (N£)  is  considered;  one  is 
attempting  to  detect  a  signal  imbedded  in  this  noise.  The  development  begins 
by  considering  the  general  continuous- time  problems. 

First,  a  characterization  is  given  of  processes  (Yt)  for  which  the 
likelihood  ratio  of  Y  w.r.t.  N  exists.  Conditions  are  then  given  that 
guarantee  existence  of  a  likelihood  ratio.  Next,  assuming  that  these 
sufficient  conditions  are  satisfied,  representations  of  likelihood  ratios  are 
obtained.  These  results  are  all  for  the  continuous- time  problem.  In  current 
practice,  detection  algorithms  will  typically  be  implemented  in  digital  format 
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after  sampling.  This  requires  that  the  continuous-time  likelihood  ratio  be 
approximated,  and  procedures  for  implementation  of  a  detection  algorithm 
specified.  This  is  the  next  step  in  the  development,  with  a  partial ly- 
recursive  formulation.  Two  versions'  of  the  algorithm  are  given. 

The  development  can  be  given  in  several  forms  and  at  various  levels  of 
complexity.  We  have  elected  to  give  the  framework  and  results  for  the  general 
continuous- time  problem  in  precise  mathematical  language.  However,  the  proofs 
are  only  summarized,  with  references  to  the  original  paper  [17]  for  the 
detailed  versions. 

Some  discussions  and  interpretations  of  the  main  results  for  the 
continuous- time  problem  are  also  included. 

The  development  of  the  discrete-time  approximation  results  in  an 
implementable  algorithm  based  upon  the  continuous- time  likelihood  ratio.  Our 
objective  in  that  section  is  to  reach  those  readers  who  may  have  an  interest 
in  actually  implementing  a  detection  algorithm.  Thus,  this  part  can  be  read 
without  serious  reference  t:j  the  development  for  the  continuous-time  analysis. 
However,  the  results  on  approximation  and  implementation  cannot  be  fully 
appreciated  without  keeping  in  mind  that  they  are  derived  from  an  analysis  of 
the  continuous-time  problem  that  makes  no  significant  limiting  assumptions  on 
the  properties  of  either  the  noise  or  the  signal-plus-noise  processes,  and 
that  the  discrete-time  algorithm  is  obtained  as  an  approximation  to  a 
continuous- time  log-likelihood  ratio. 

The  reader  interested  in  genuine  engineering  applications  can  find 
discrete-time  implementations  of  detection  algorithms  in  Section  3.5. 

For  the  continuous- time  problem,  the  basic  setup  is  as  follows.  (N t )  and 
(Yt),  t  in  [0,1],  are  real-valued  stochastic  processes  on  (O.S.P).  (Nt)  is 
Gaussian,  m.s.  continuous,  separable,  and  vanishes  a.s.  (almost  surely)  at 
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,  the  set  of  all 


t  =  0.  i)y  and  are  the  induced  measures  on  R^’^ 
real-valued  functions  on  [0,1].  Py  and  p^  are  the  induced  measures  on  L^CO.l] 
(when  (Y^.)  has  paths  a.s.  in  ^[O.l])-  p^  is  assumed  to  have 
infinite-dimensional  support.  The  following  problems  are  considered. 

(1)  Determine  conditions  for  existence  of  the  likelihood  ratios  duy/du^  and 

dpy/dpjji 

(2)  When  absolute  continuity  holds,  find  duy/du^  and  dpy/dp^. 

The  interval  [0,1]  is  selected  only  for  convenience;  the  results  hold  for 
any  finite  interval. 

3.2.  Mathematical  preliminaries 

All  stochastic  processes  will  be  real-valued  and  defined  on  a  probability 
space  (0.2S.P)  with  index  set  [0.1],  unless  otherwise  noted.  For  a  stochastic 
process  (V  ) ,  ct^(V)  will  denote  the  o-field  generated  by  {V  ,  s  i  t},  and 
crt(V)  its  P-completion.  o^(V)  and  o(V)  are  the  corresponding  filtrations: 
o(V)  =  {u^.  C  i  t  <,  1}.  Lt(V)  is  the  closed  linear  span  in  ^[P]  of  {Vs> 
s  i  t}. 

Let  g(Y)  be  the  completed  filtration  determined  by  a  stochastic  process 
(Yt).  The  predictable  o-field  ^[g(Y)]  is  the  smallest  o-ficld  in  l?x[0,l] 
containing  all  sets  of  the  form  {(u,t):  X((J,t)  €  A},  where  A  is  any  Borel  set 
in  R  and  (Xt)  is  any  process  adapted  to  g(Y)  and  having  continuous  (w.p.  1) 
paths.  A  stochastic  process  (Wf)  is  g(Y)-predictable  if  the  map  (u.t)  -» 

W(u,t)  is  #[g(Y)]/B[R]  measurable. 

(N ^ )  will  denote  a  Gaussian  process  that  is  m.s.  (mean-square) 
continuous,  separable  with  respect  to  closed  sets,  zero-mean,  and  vanishes 
(with  probability  one)  at  t  =  0.  We  assume  WL0G  that  &  is  the  smallest 
o-field  containing  g(N):  28  =  a^(N). 
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For  measures  r  and  y  defined  on  the  same  a-field  si,  t  <<  -r  (t  absolutely 
continuous  w.r.t.  -r)  if  for  all  sets  A  in  si  such  that  if  -r(A)  =  0,  one  has 
also  t(A)  =0.  r  ~  t  (t  equivalent  to  nr )  if  f(A)  =  0  <=>  t(A)  =  0. 

For  a  positive  integer  M  <  «°,  /  will  be  the  Borel  cr-field  of  /  under 
the  product  topology.  Cq[0,1]  =  Cq  is  the  set  of  all  real-valued  functions 
that  are  continuous  on  [0,1]  and  vanish  at  t  =  0.  Cq  is  endowed  with  the  sup 
norm  topology,  and  <€  is  the  resulting  Borel  o-field,  also  generated  by  the 
evaluation  maps  {tt^,  0  <  t  £  1},  =  x^.  The  Borel  o-field  of  C*  is/, 

the  product  o-field  of  M  copies  of  IR^’^  is  the  space  of  real-valued 

functions  on  [0.1]  and  31^’^  is  the  o-field  in  generated  by  the 

cylinder  sets  (x:  (x(t^) . x(tn))  €  An}  for  n  £  1,  t^ . t^  in  [0.1]. 

An  €  IB[IRn] . 

E(#)  will  denote  mathematical  expectation;  the  underlying  measure  will  be 
clear  from  the  context. 

The  approach  used  here  makes  extensive  use  of  the  spectral  (Cramer-Hida) 
representation  of  (N t )  and  its  properties  [18],  [19].  (N ^ )  has  a  proper 
canonical  representation,  which  we  assume  to  have  the  form 

M  t 

K(w)  =  2  f  F.(t.s)B.(u,ds).  (2.1) 

1  0  1 

Without  the  assumption  that  (Nt)  is  Gaussian,  the  B^’s  are  zero-mean 

orthogonal-increment  processes,  mutually  orthogonal  and  m.s.  continuous. 

2 

Their  nondecreasing  variances  EB^*}  define  Borel  measures  P^  on  [0,1]  in  the 

2  2 

usual  way:  /3^(a,b]  =  EB^(b)  -  EB^(a);  moreover,  <<  P^  for  i  £  1.  Each 

[0*1]  *  [0.1]  -*  R  is  Borel-measurable,  F^t.x)  =  0  for  x  >  t,  and 

M  1  1 

Iff  FT( t,s)d/3  (s)dt  <  ®. 

WOO  1  1 

M  £  ®  is  the  multiplicity  of  (Nt). 
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The  proper  canonical  representation  (2.1)  has  the  property  that  L^fN)  = 
Lt(B)  for  all  c  in  [0,1],  and  that  2^_j.[qF^  ( t ,  sjg^  (sjdf^  (s)  =  0  for  all  t  in 
[0,1]  if  and  only  if  gi  =  0  in  ^[f^].  all  i  £  M.  Thus  (F^(t.*),  t  €  [0.1]} 
spans  L  [Pt]. 

The  representation  (2.1)  is  an  equality  in  the  mean-square  sense,  thus 
holds  a.e.  dP  for  each  fixed  t.  However,  taking  both  sides  of  (2.1)  separable 
w.r.t.  closed  sets  gives  path  equality  a.e.  dP. 

The  assumption  that  (N^)  is  Gaussian  further  implies  that  the  B^'s  are  a 
Gaussian  family;  thus  they  are  mutually  independent,  have  independent 
increments,  and  hence  can  be  assumed  to  be  path-continuous.  Moreover, 
equality  of  L^fN)  and  Lt(B)  implies  that  gt(N)  =  gt(B),  for  all  t  in  [0.1], 
Each  (Bj(t))  is  a  martingale  w.r.t.  g(B) ,  from  the  independence  of  the  B^’s, 
and  thus  also  w.r.t.  g(N)  and  w.r.t.  g(N)  v  g(V)  ,  where  ( V t )  is  any  stochastic 
process  independent  of  (N^). 

M  is  the  multiplicity  of  (Nt).  We  assume  throughout,  unless  otherwise 
noted,  that  M  <  °°.  An  extension  of  the  results  given  here  to  Gaussian 
processes  of  infinite  multiplicity  would  require  some  results  on  infinite¬ 
dimensional  stochastic  calculus  that  are  not  readily  available. 

For  a  vector  stochastic  process  (V^ )  having  paths  a.s.  in  C^,  Py  will 
denote  the  induced  measure  on  C^:  Py(A)  =  P°V  *(A).  where  V  is  the  path  map. 
Similarly,  if  (Vt)  is  a  scalar  process,  then  Uy  will  be  the  probability 
induced  on  21^’^  by  the  process.  If  { V ^ )  is  measurable  and  has  paths  a.s.  in 
^[O-l]  (Lebesgue  square-integrable  functions  on  [0.1])  then  Py  is  the  induced 
measure  on  the  Borel  a-field  of  L^O.l]  and  V  will  denote  the  path  map  of  Q 
into  1U)[0,  1] . 

We  assume  WL0G  that  support  (Pg)  =  and  that  supp(p^)  =  L^O.l].  Since 
these  measures  are  Gaussian,  their  supports  are  closed  linear  manifolds  equal 
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to  the  closure  of  the  ranges  of  their  covariance  operators  [20],  One  can  thus 
always  work  with  this  subspace;  it  preserves  the  original  linear  space 
structure  under  the  original  norm  (and  inner  product,  for  I^EO.l]). 

The  representation  (2.1)  and  the  usual  properties  of  a  m.s.  continuous 
process  generate  a  family  of  real  separable  Hilbert  spaces.  For  a  m.s. 
continuous  process  (Vt),  let  Hy  denote  its  RKHS  (reproducing  kernel  Hilbert 
space),  with  inner  product  <*,*>y.  Let  ry  be  the  covariance  function  of  (V  ) 
and  Ry  its  (trace-class)  covariance  operator  in  LgCO.l];  Ry  can  be  represented 
as  an  integral  operator  with  ry  as  its  kernel.  All  elements  in  Hy  are 
continuous  functions  on  [0,1].  Ry  defines  a  Hilbert  space  Hy  of  L2[0,l] 
elements,  consisting  of  range(Ry)  together  with  the  inner  product 

<h.g>^  =  2  <g,u  Xh.u  >/p 
V  n  n  n  n 

where  {p^,  n  >  1}  and  {u^,  n  >  1}  are  the  non-zero  eigenvalues  and  associated 
orthonormal  eigenvectors  of  Ry,  and  <*,•>  is  the  L2[0.1]  inner  product.  Since 
it  is  assumed  that  support(p^)  =  LgtO.l],  the  eigenvalues  {X^,  n  >  1}  of 
are  all  non-zero  and  the  associated  orthonormal  eigenvectors  are  complete  in 
LgCO.l].  For  ^2^i-^’  *  ^  already  defined,  H  will  denote  ®^_^2^i-^: 

functions  f  of  the  form  f  =  (f.,f0 . f„)  with  each  f.  in  Lo[0.],  and  with 

inner  product  of  f  and  g  given  by  2^_^/Qf ^(s)gi(s)d/3i(s) .  Unitary  maps 
between  these  Hilbert  spaces  are  described  in  the  following  lemma. 

Lemma  3.1.  Define  the  following  linear  maps. 

Ut:  Hn-*Hn.  U^g  =  [g], 

where  [g]  is  the  equivalence  class  in  L^lO.lJ  generated  by  g. 
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M  1 

U2:  H-*Hn.  =  1  /  Fl(t.s)gl(s)d#.(s). 

where  g  =  ( g1 . g^)  . 

U3:  L1(N)-^Hn.  U3Nl  =  RN(t.-). 

t 

U4:  H  -  Hb.  (U4g).{ t)  =  X  g. (s)d£. (s) . 

-  o 

Then'-  the  maps  U^,  U U^,  and  U 4  are  unitary. 

Corollary.  Let  £  be  the  vector  function  with  ith  component  f^( s)  = 

XoF‘l( s,u)d/3.(u).  Then  is  in  and  <f*.  f;j>w  =  S^.EB^  tJB^s)  for  all 
i.j  £  M  and  s.t  in  [0,1]. 

For  the  main  results  to  be  discussed  below,  a  sketch  of  each  proof  will 

be  given.  For  the  sake  of  illustration,  we  will  assume  in  each  sketch  that 

(Nt)  has  multiplicity  M  =  1  and  that  =  B  is  the  standard  Wiener  process, 

2  t 

EB  (*.t)  =  t.  Thus,  N(t)  =  JqF( t, s)dB(s) .  This  assumption  of  unit 
multiplicity  and  B^  the  Wiener  process  is  only  used  in  the  sketches  to 
simplify  the  illustration.  The  detailed  proofs  are  in  [17]. 

Theorem  3.1.  For  i  £  M,  let  (<P^(t))  be  a  measurable  stochastic  process 
with  paths  a.s.  in  ^[P^] •  Define  a  vector  process  (Zfc)  mith  paths  a.s. 
in  by 

t 

Zt(t)  =  X  ♦l(x)dPl(x)  +  Bt(t) 

and  stochastic  processes  (Sfc )  and  ( y )  by 

M  t  M  t 

S  =  I  X  F  (t,s)Ms)d/3  (s).  Y  =  2  X  F  (t.s)dZ .(s). 

1  10  10 
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Then  there  exists  $'•  that  is  <6^/9^^’^  measurable,  and 

such  that 

(1)  For  ail  t  in  [0..1],  {a>}  =  ^ ($[!(<*>)])  a.s.  dP(v ); 

(2)  If  P^  <<  Pg.  then  for  all  t  in  [0.1],  Y fc(<j)  =  irt($[Z(<j)])  a.s. 
dP(< j). 

Moreover,  there  exists  a  map  •'  -»  I^CO.i]  such  that  is  Bore l- 

measurable,  and 

(lj)  N(u)  =  $|[B((j)]  a.e.  dP(cj); 

(2^)  If  <<  Pg.  then  X(<j)  =  $j[Z(u)]  a.e.  dP(u). 

If  (N )  has  continuous  paths,  then  there  exists  a  map  -» 

which  is  measurable ,  and 
(12)  N(w)  =  a.e. 

(22)  If  Pz  «  Pg.  then  Y(g>)  =  ^CZC")]  a.e.  dP(u) . 

None  of  the  statements  of  this  theorem  require  that  (Nt)  be 
Gaussian,  nor  that  the  multiplicity  M  be  finite. 


Remarks .  As  noted,  the  results  of  Theorem  3.1  do  not  require  that  (N t )  be 


Gaussian.  However,  when  (Nt)  is  Gaussian,  and  one  further  assumes  that  (<^) 
is  g(B)-adapted  for  i  i  M,  then  <<  Pg  ([7],  Theorem  7.2).  Thus,  the 
relations  given  in  (2).  (2^),  and  (22)  all  hold  in  this  case. 

2  2 

Sketch  of  Proof.  Since  EN£  =  IIF(t,,)ll^  the  map  t  -+  F(t,*)  is  uniformly 

2 

continuous  as  a  map  from  [0,1]  into  L„[0,1].  F( — ,  •)  can  thus  be  approximated 

z  2n 

]( 

by  simple  functions  and  F(t,«)  by  functions  equal  to  F( — ,  •)  on  the  interval 

2n 


n.k 


k-1  k_ 

j2n  2nj 


.  for  k  i  2n.  F(t,*)  can  thus  be  approximated  by  sets  of 


simple  functions. 
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For  every  simple  function  s(x)  =  S^.c.I-,  ,  ,(x)  and  every  real 

i-l  i  ja^.D^j 

function  f:  [0.1]  -*  IR,  one  can  define  a  functional 


-valued 


m 


Qg:  f  -*  ^2c.[f(b.)  -  f (a.)] . 


A  map  Cq[0,1]  induced  by  F,  is  then  defined  as  follows:  on 

I  ,  ,  $  [f]  =  ~t  (f).  where  t  is  the  function  Q  when  the  simple  function  s  is 
n.k  nL  J  nv  1  n  s 

an  approximation  to  F( — ,  •). 

2n 

The  second-order  properties  of  B  are  then  sufficient  to  ensure  that  ($n) 
converges  in  probability  with  respect  to  Pg.  with  the  limit  being  a  measurable 
map  $.  $  is  then  shown  to  have  the  desired  properties.  D 


Theorem  3.1  is  a  key  tool  for  obtaining  the  results  to  follow.  Formally, 
of  course,  one  can  write  N  as  the  result  of  a  mapping  F  on  the  space  of 
continuous  functions  such  that  N  =  FB.  Intuitively  this  makes  sense:  in  the 
proper  canonical  Cramer-Hida  representation,  2t(N)  =  2t(B)  for  all  t  in  [0,1]. 
Theorem  3.1  gives  both  existence  and  precision  to  this  heuristic  notion.  It 
should  perhaps  be  noted  that  the  mapping  $:  -*  gives  a  uersion  of 

(Nt)  from  (Bt);  similarly,  $  gives  a  version  of  (Xt)  from  (Zt) ■ 

3.3.  Absolute  Continuity  (Existence  of  the  Likelihood  Ratio! 

In  this  section  several  results  are  given  for  existence  of  a  likelihood 
ratio.  If  one  has  a  process  (Yt).  interpreted  as  signal-plus-noise,  then  a 
likelihood  ratio  will  exist  if  the  measure  induced  by  (Yt)  is  absolutely 
continuous  with  respect  to  the  measure  induced  by  (N t ) .  In  signal  detection 
terminology,  this  means  that  if  the  probability  of  false  alarm  of  any  test 
statistic  is  equal  to  zero,  then  the  probability  of  detection  for  that  test 
statistic  is  also  equal  to  zero. 
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We  will  first  give  conditions  that  are  necessary  for  existence  of  a 
likelihood  ratio.  These  conditions  characterize  the  processes  { Y t )  such  that 
duy/du^  and  djiy/dp^  exist. 

Theorem  3.2.  Let  (V^)  he  a  stochastic  process  independent  of  (N^). 
Suppose  that  (Tfc)  is  a  process  such  that  Vy  <<  v^. 

If  (Y t)  is  adapted  to  ct(N)  v  ct(V) .  then  +  N*  a.e.  dP  for 

each  fixed  t  in  [0,1],  where  (N )  has  the  same  finite  dimensional 

distributions  as  (1 } .  and  is  adapted  to  a(Y).  W*  =  J^F  ( t , s)dB*(s) 

* 

a.e.  dP.  each  fixed  t  in  [0,1],  where  the  B^'s  are  mutually  independent 

& 

zero-mean  Gaussian  process.  (Bt)  has  the  same  lau>  as  (Bfc),  and 
a(B  )  =  o(N  ).  Moreover. 

M  t 

S  =  2  J*  F  (t.s)*  (s)d/?  (s). 

C  i=l  0  1  1  1 

where  (+t(t)).  i  i  M,  is  a  stochastic  process  that  is  o{Y)-predictable 
and  has  paths  a.s.  in 

Remark .  Use  of  the  filtration  ct(N)  v  g(V)  permits  application  of  these 
results  to  problems  in  several  areas  involving  feedback.  For  example,  in 
information  theory,  the  "message"  (V  )  can  be  a  stochastic  process  independen 
of  the  channel  noise  (Nt);  the  transmitted  signal  depends  in  a  causal  manner 
on  this  message  and  the  channel  output,  which  is  of  the  form  =  St  +  N^, 
where  St  =  f[V  ,Yg,  s  £  t]  with  f  a  "coding"  function.  Thus,  one  needs  that 
(St)  be  adapted  to  o(V)  v  S(N).  In  the  present  context,  this  filtration  is 
important  for  signal  detection  purposes.  The  message  process  (V  )  is  usually 
independent  of  the  ambient  noise,  although  the  eventual  received  signal 
process  may  depend  on  both  (Vt)  and  (N t ) .  For  example,  in  active  sonar 
problems  where  reverberation  is  the  dominant  noise  component,  the  transmitted 
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waveform  is  affected  by  the  reverberation-causing  scatterers  between  the 
transmitter  and  the  target  as  it  travels  through  the  water,  as  is  the  eventual 
reflection  from  the  target  as  this  reflected  waveform  travels  toward  the 
receiver.  Since  (Y^)  will  eventually  represent  signal-plus-noise,  it  is 
obviously  important  that  ( Y ^ )  be  adapted  to  ct(V)  v  ct(N). 

Sketch  of  Proof.  Let  D  denote  the  likelihood  ratio  (Radon-Nikodym  derivative) 
of  Uy  with  respect  to  evaluated  at  the  path  N(u)  =  {N(a),t),  t  €  [0,1]}. 

The  conditional  expectation  E[D|ot(N)]  has  a  representation  as  a  stochastic 
integral  with  respect  to  B,  since,  by  the  proper  canonical  Cramer-Hida 
representation,  ctc(N)  =  Let  g  be  the  function  resulting  from  this 

integral  representation.  Using  Girsanov’s  theorem,  one  obtains  a  Brownian 
motion  G,  in  terms  of  B  and  g,  with  respect  to  the  probability  Q  defined  by 
dQ  =  DdP.  To  represent  Y  as  a  signal  imbedded  in  additive  noise,  one  must 
express  G,  B,  and  g  as  N,  Y,  and  s,  s  being  the  "derivative"  of  the  signal. 
This  is  done  by  using  the  predictability  properties  of  g.  □ 

Theorem  3.2  states  that  when  there  is  absolute  continuity  for  Y  with 

if 

respect  to  N,  then  Y  has  a  signal-plus-noise  representation,  Y  =  S  +  N  .  It 

if 

should  be  carefully  noted  that  N  is  a  process  with  the  same  family  of  finite¬ 
dimensional  probability  distributions  as  N,  therefore  the  same  law  on  . 

if 

However,  (N ^ )  is  adapted  to  ct(Y) .  This  is  a  fact  of  major  importance. 

Roughly,  it  states  that  Y  has  a  signal-plus-noise  representation  where  the 
noise  is  a  measurable  function  of  the  original  noise  and  of  the  message 
process  (represented  by  o(V)  in  Theorem  3.2).  However,  since  the  likelihood 
ratio  of  Y  w.r.t.  N  depends  only  on  laws  induced  by  the  two  processes,  and  not 

if 

on  the  measurability  (adaptation  properties),  the  fact  that  N  is  a  function 
of  message  and  original  noise  does  not  affect  the  likelihood  ratio.  In  sum. 


Book  -  2/14/90  -  15 


the  likelihood  ratio  of  Y  with  respect  to  N  is  the  same  as  that  of  Y  with 
respect  to  N.  This  fact  will  be  used  later  to  partially  justify  the  form  of 
the  likelihood  ratio  used  in  the  discrete-time  approximation. 

Theorem  3.2  contains  necessary  conditions  for  existence  of  a  likelihood 
ratio.  We  now  turn  to  sufficient  conditions. 

Theorem  3.3.  Let  ( Vfc }  be  a  stochastic  process  independent  of  (N^). 

Suppose  that  (Sfc)  is  a  stochastic  process  adapted  to  o(N )  v  g(V)  and  with 

paths  a. s.  in  H^. 

(1)  If  Yfc  =  +  Nt  a.e.  dP .  for  each  fixed  t  in  [0,1],  then  Vy  <<  v^. 

(2)  If  Y  =  a.e.  dtdP,  then  Py  <<  p^. 

Sketch  of  Proof.  Since  the  signal  has  paths  which  are  almost  surely  in  the 
reproducing  kernel  Hilbert  space  of  the  noise,  for  almost  every  u,  there  is 
(with  respect  to  Lebesgue  measure)  an  equivalence  class  of  functions  [s^]  in 
LgCO.l]  such  that 

t 

S(w,t)  =  f  F(t,x)[s  l(x)dx. 

0  U 

The  main  step  in  the  proof  consists  in  showing  that  one  can  choose  a 
representative  s^  in  each  class  [s^]  such  that  the  resulting  process  (st), 
s(u.t)  =  s  (t).  is  a  predictable  stochastic  process,  thus  has  adequate 
smoothness  properties. 

Once  this  is  done,  one  turns  to  the  well-known  absolute  continuity 
properties  of  translates  of  Wiener  processes.  Theorem  3.1  permits  one  to 
write  N  =  $[B],  where  N  is  a  version  of  Y  and  Cq[0,1]  -»  IR^’^  is 
measurable.  The  above  representation  for  (S t }  gives  <<  Pg.  where  Z(cj.t)  = 
/^(w.tjdt  +  B(u,t),  and  by  Theorem  3.1  again,  one  has  a  version  Y  of  Y 
defined  by  Y  =  $[Z].  Since  v ^  =  Vy  and  the  result  follows.  O 
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The  results  of  Theorem  3.3  are  those  that  will  be  used  to  obtain  a 


detection  algorithm  for  applications.  They  are  also  conditions  that  one  may 
hope  to  verify  for  actual  physical  problems.  For  example,  they  show  that  the 
likelihood  ratio  will  exist  if  the  signal  process  has  all  (with  probability 

one)  sample  paths  belonging  to  the  RKHS  of  the  noise,  and  if  the  signal 

process  up  to  time  t  (all  t  in  [0,1])  can  be  obtained  as  a  function  of  the 

signal  and  noise  up  to  time  t.  These  are  conditions  that  one  may  expect  to  be 

satisfied.  They  are  actually  stronger  than  the  sufficient  conditions  given  in 
Theorem  3.3,  since  those  conditions  assume  only  that  (S^)  is  adapted  to  the 
completed  filtration  g(V)  v  g(N),  rather  than  the  uncompleted  filtration 
ct°(V)  v  g^(N).  It  is  well-known  [21]  chat  when  (S^)  is  adapted  to 
ct^(V)  v  ct^(N),  then  for  each  t,  St  can  be  expressed  (with  probability  one)  as 
S(<j,t)  =  Gt[V(<j,*),  N(<j,*)],  where  Gt  is  a  Bore  1 -measurable  map  on  IR^’^  that 
acts  only  on  V(u,*)  and  N(u.*)  up  to  time  t. 

We  have  been  separately  considering  conditions  for  existence  of 
likelihood  ratios  for  measures  induced  on  IR^’^  and  for  measures  induced  on 
L2[0.1].  In  addition,  if  (Nt)  has  continuous  paths,  one  may  wish  to  consider 
existence  of  the  likelihood  ratio  for  the  measures  induced  on  the  space  of 
continuous  functions.  In  the  next  result,  relations  are  drawn  for  the 
existence  of  these  several  likelihood  ratios. 

Theorem  3.4.  (1)  Suppose  that  (Y fc)  Is  separable  for  closed  sets  and 

adapted  to  g(N)  v  g(V).  Then  Vy  <<  v ^  =>  (Y- )  a.s.  in  ^[O.l]  and 
My  <<  Mjy -  The  converse  is  false. 

(2)  Suppose  that  (Nfc)  has  continuous  paths,  and  let  (Y^)  be  any 
process  with  continuous  paths.  Define  \y  and  to  be  the  induced 
measures  on  C.  Then 
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Uy  «  Vji  <  =  >  Xy  <<  X^  <  =  >  fly  <<  Hjj, 

VN  <<  VY  <<  \  <<  ^T 

These  equivalences  do  not  require  that  {Nfc )  be  Gaussian  or  that  =  0 
a.s.  dP. 

Sketch  of  Proof.  (1)  The  representation  of  Y  as  a  signal  embedded  in  additive 
noise  given  in  Theorem  3.2  holds  almost  surely  at  every  fixed  time  point.  The 
separability  assumption  then  produces  pathwise  equality,  and  consequently, 
since  the  signal  is  continuous  and  the  noise  mean-square  continuous,  the 
observation  has  paths  in  i£2[0,l].  One  then  applies  Theorem  3.3  to  obtain  the 
first  statement;  the  converse  is  proved  by  using  an  example  involving  a  known 
signal . 

(2)  The  natural  injection  map  i^:  Cq[0,1]  is  '6/Sf  measurable,  and 

this  is  enough  to  yield  those  assertions  which  pertain  to  absolute  continuity 
for  the  spaces  IR^’^  and  Cq[0,1].  Those  regarding  Cq[0,1]  and  LgCO.l]  follow 
from  a  theorem  due  to  Kuratowski  which  states  that  Bore 1 -measurable  1:1  maps 
between  complete  separable  metric  spaces  carry  Borel  sets  into  Borel  sets 
[22],  and  the  observation  that  Cq[0,1]  can  be  continuously  injected  into 
L2[0,1].  □ 

The  following  result  illustrates  some  of  the  subleties  contained  in  the 
statements  of  Theorem  3.2.  In  particular,  this  result  shows  that  the 
sufficient  conditions  of  Theorem  3.3  are  not  also  necessary  for  absolute 
continuity. 

Theorem  3.5.  Let  ( )  be  any  m.s.  continuous  zero-mean  Gaussian  process 
on  [0,1].  Suppose  that  (Yfc)  Is  a  process  adapted  to  a(V)  v  o(N),  where 
(Vt)  is  independent  of  (N^).  If  Uy  <<  (resp. .  Py  <<  p^) ,  then  it  is 
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not  necessary  that  Y  =  Sfc  +  a.e.  dP  for  each  fixed  t  in  [0,1],  where 
(Sfc)  is  adapted  to  o(V)  v  o(N)  and  has  almost  all  paths  in  H ^  (re sp. , 

Hff)  ■  If  (Nfc)  and  (Yfc)  haue  continuous  paths,  then  such  a  representation 
is  not  necessary  for  Xy  <<  X^. 

Sketch  of  Proof.  Let  Y  be  equal  to  V  +  N.  V  Gaussian,  independent  of  N.  It 
is  possible  to  choose  V  such  that  absolute  continuity  obtains  while  the  paths 
of  V  are  not  (with  probability  one)  in  the  reproducing  kernel  of  the  noise: 
the  covariance  of  V  yields  an  integral  operator  of  the  form  R^TR^  with  T 
Hilbert-Schmidt,  but  not  trace-class.  □ 


0.4.  Likelihood  Ratios 

In  this  section,  general  expressions  for  likelihood  ratios  are  obtained 
under  the  assumption  that  the  sufficient  conditions  of  Theorem  3.3  are 
satisfied.  First,  a  result  on  the  (regular)  conditional  probability  of  B 
given  N  is  obtained  when  N  is  viewed  as  a  map  into  Such  a  conditional 

probability  exists,  since  (Bt)  takes  its  path  values  in  the  complete  separable 
metric  space  while  (Nt)  is  defined  on  the  same  underlying  probability  space 
(0.28, P)  as  (Bt)  [23]. 

Let  T  =  {t^ . tn,  .}  C  ]0,1]  be  a  strictly  ordered  set  of  p[n] 

n  1  p(n]  ~ 

00 

elements,  for  every  n  in  IN.  and  T  =  U  ,T  .  The  sets  T  are  increasing  and 

n=l  n  n 

chosen  so  that  T  is  dense  in  [0,1].  If  S  is  a  finite  or  countable  set  in 
[0.1]  whose  elements  are  ordered  as  (s^ • s2' s-j* • • • }  and  f  is  a  function  defined 
on  [0.1],  then  Wg  is  the  map  defined  by  the  relation 

Wgf  =  (f(s1),f(s2),f(s3) - ). 


en  is  the  map  defined  on  R  which  retains  the  first  n  components  of  every 
i  M 

sequence.  It  will  be  the  element  of  ©j_jL2[0j]  having  ith  component  given  by 
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Theorem  3.6.  The  conditional  law  Q(f.*)  of  B  giuen  that  If  =  f  is  given 
by  the  relation 

O(f.C)  =  Ic(E[f.*]).  ~  a.e.  f. 


where  m  is  a  continuous  Gaussian  stochastic  process  defined  on 
(Ir[0.1]^  which  has  the  same  law  as  B  with  respect  to  P.  m  is 

the  weak  limit  of  a  sequence  fm  .  n  €  IN}  of  continuous  Gaussian 
stochastic  processes  defined  on  S^’^,  v which  have 

components  giuen  by  the  relation 


1  £  i  £  M  ( U 2  is  defined  in  Lemma  3.1).  Moreover.  [m<>$](c)  =  c  a.e. 
dPg(c). 

Sketch  of  Proof.  One  first  notices  that  it  suffices  to  compute  the 
conditional  law  of  B  with  respect  to  N,  where  B  and  N  are  vectors  obtained  by 
sampling  B  and  N  at  a  dense  and  countable  set  of  time  points.  The  problem  is 
further  reduced  to  a  finite-dimensional  one  by  restricting  attention  to 
cylinder  sets.  One  then  uses  the  usual  formulae  for  conditional  laws  of 
Gaussian  random  variables,  and  then  attempts  a  limiting  procedure. 

The  limiting  procedure  succeeds  for  the  following  reasons.  Let  B^  and  N 
be  the  finite-dimensional  Gaussian  vectors  obtained  in  the  reduction  outlined 
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above.  The  conditional  law  of  B  given  N  has  a  mean  m  which  depends  on 

~n  -p  -n,p 

N  .  and  a  covariance  2  which  does  not.  One  must  then  observe  that  the 
~P  n.p 

elements  of  the  mean  and  the  covariance  can  be  expressed  in  terms  of  finite¬ 
dimensional  projections  in  the  reproducing  kernel  Hilbert  space  of  the  noise 
N.  The  time  points  being  dense  and  N  being  mean-square  continuous  ensure  that 

these  projections  converge  to  the  identity.  The  mean  m  produces  a 

~ n.p 

continuous  Gaussian  process  which  has  a  weak  limit  m,  with  respect  to  P^,  and 

m  is  the  point  at  which  the  mass  of  the  conditional  law  of  B  given  N  is 

concentrated.  Finally,  the  fact  that  m°$(c)  =  c  a.e.  dPn(c)  follows  from 

o 

martingale  equalities.  Since  this  is  not  contained  in  [17],  its  proof  will 
now  be  given  in  full. 

By  Theorem  3.1,  with  ir^cjft)  =  c^t),  one  has 

Pfi{c  in  c&o  ,1]:  sup  |tt  [(mo*)(c)](t)  -  tt  [c](t)|  >  e} 

-  0*t*l  1  1 

=  P{u:  sup  |(m  °N[u])(t)  -  B  (u,t)|  >  e}, 

0^1  1  1 

where  (N  )  is  a  version  of  (N  ) .  N  (u)  =  N  (u)  =  ir  $[B(cj)]  a.e.  dP(w)  for  each 

t  t  [  [ 

fixed  t.  By  Theorem  3.6,  m  has  w.r.t.  the  same  law  as  B,  so  that 

A/ 

((m°N[u])(t)  -  B(u,t))  is  a  continuous  square-integrable  martingale.  This 
gives 

P{w:  sup  |(m  oN[cj])(t)  -  B.(u,t)|  >  e}, 

Oitil  1 

i  “§•  E|(mioN)(l)  -  Bi(l)  |2 
£ 

=  -j  [E(mioN)2(l)  +  EBj(l)  -  2E(mioN)(l)Bl(l)  . 

It  now  suffices  to  show  that  Efm^N) ( lJB^  1)  =  /3^(1).  m^NJl)  is  the  limit  in 

LgCdP]  of  ny»N(l),  where  [m^ox](l)  =  <w^.  fj,  e  r-iX^.  Thus, 

n  PL  J 
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E(«  oNMOBjd)  =  lim  E(m”oN)(l)Bi(l) 

n 

tn  tn 

POl  _i  J  n  k  n 

=  lim  2  Vjj.k)  I  F  (t"s)d/J  (s)  J  F(t"  v)dp.(v) 
n  k, j=l  n,n  0  1  J  1  0  K  1 

=  =  pj(l).  Q 


We  now  go  to  the  final  step  in  the  general  continuous- time  detection 
problem:  specification  of  the  likelihood  ratio  when  absolute  continuity 

exists. 

Theorem  3.7.  Suppose  that  (V  )  is  a  stochastic  process  independent  of 
(N  ) .  and  that  (S^)  is  a  stochastic  process  adapted  to  o(N)  v  a(V)  u)ith 
almost  all  paths  in  H^.  Then  S^  =  (  t .  s)<^t (sjd^^  (s)  .  where 

(♦^(s))  is  2(N)  v  2 (V) -predictable  and  has  paths  a.s.  in  Define 

(Zfc)  to  be  the  process  with  ith  component  given  by  Z^(t,u)  = 
f£)*i(s.u)dfii(s)  +  BJt.u).  Pz  «  Pg  and  St  +  Nt  =  ( t .  s)dZ.  (s)  .  Then 

(1)  Suppose  V  =  S  +  a.e.  dP,  for  each  fixed  t  in  [0.1].  Then 

[di)y/du^](y)  =  [dP^/dPg^my)  a.e.  dv^(y),  where  m  is  defined  as  in 
Theorem  3.6. 

(2)  Suppose  y  =  Sfc  +  Nt  a.e.  dtdP.  Then  [dp^/dp^](x)  = 
[df^'vdPyTfmx)  a.e.  dp^(x),  where  m(y)  in  has  as  its  ith  component 
[B(«)]t(0  =  2ri^<y.enXf[.en>/Xri. 

(3)  Suppose  that  Y  =  Sfc  +  a.e.  dP  for  each  t  in  [0,1],  and  that 
(Nt)  and  (Yfc)  haue  continuous  paths.  Then  [dX^/dA^](x)  =  [dP^/dPg^migx) 
a.e.  dAjy(x),  where  m  is  defined  in  (2)  and  i^:  C^[0,1]  -»  I^CO.l]  is  the 
injection  map. 
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Sketch  of  Proof.  This  is  a  direct  consequence  of  the  fact  that  the  maps  m 
defined  in  Theorem  3.6  and  part  (2)  of  Theorem  3.7  act  as  inverses  of  the  maps 
$  and  The  map  m  of  part  (1)  was  shown  above  to  satisfy  mo$(c)  =  c  a.e. 

dPg(c).  For  the  map  m  of  part  (2).  the  fact  that  m®f^(c)  =  c  a.e.  dPg(c)  is 
proved  by  using  the  fact  that  P_  is  Gaussian  and  the  definition  of  m.  □ 

D 

Corol lary.  Under  the  hypotheses  of  part  (2)  of  Theorem  3.7,  m$j(x)  =  x, 
a.e.  dPD(x),  with  $.  as  defined  in  Theorem  3.1  and  m  as  defined  in  part 
(2)  of  Theorem  3.7. 

In  obtaining  conditions  for  existence  of  the  likelihood  ratio,  the  maps  $ 
and  defined  in  Theorem  3.1  filled  an  essential  role.  Theorem  3.7  shows 
that  their  inverses  (the  m  maps)  play  a  correspoi.dingly-essential  role  in 
specifying  the  form  of  the  likelihood  ratios. 

Explicit  representations  for  the  likelihood  ratio  dP^/dPg  can  be  obtained 
from  known  results,  since  (B t )  has  independent  components  that  are  path- 
continuous  Gaussian  martingales  with  respect  to  ct(N)  v  ct(V) .  See,  for 
example,  [9].  It  will  be  noted  that  although  the  proofs  of  absolute 
continuity  require  the  proper  canonical  representation  of  (N^),  the  results  do 
not  require  that  this  decomposition  be  known.  This  is  a  non-trivial 
consideration,  since  determining  the  decomposition  is  well-known  to  be  a  very 
difficult  problem.  However,  the  explicit  expressions  for  the  likelihood 
ratios  do  require  knowledge  of  the  proper  canonical  representation.  This 
constitutes  a  significant  problem  in  obtaining  an  implementable  discrete-time 
detection  algorithm  based  on  the  continuous-time  likelihood  ratio. 

The  first  main  objective  has  now  been  achieved:  beginning  with  a  Gaussian 
noise  of  a  very  general  type,  conditions  have  been  obtained  for  existence  of 
the  likelihood  ratio,  and  expressions  have  been  obtained  for  the  likelihood 
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ratio  when  it  exists. 


Thus,  the  first  part  of  the  program  is  complete.  We  now  move  to  the 
second  part,  which  is  equally  important-"  approximation  and  implementation  of 
the  likelihood  ratio. 

3.5.  Discrete-Time  Approximation  and  Implementations 

The  results  given  above  are  for  continuous-time  observations.  The 
expressions  for  the  likelihood  ratio  given  in  Theorem  3.7  are  very  general, 
but  in  that  form  are  simply  mathematical  results.  To  be  useful,  they  must  be 
converted  into  signal  detection  algorithms.  The  goal  is  to  obtain  an 
algorithm  that  meets  the  four  criteria  discussed  in  Section  3.1: 

(1)  Likelihood-ratio-based; 

(2)  Information-preserving; 

(3)  Implemen table; 

(4)  Adaptive. 

Criteria  (1)  and  (2)  will  be  met  by  constructing  tv.'o  algorithms  that  are 
discrete-time  approximations  to  the  general  expressions  for  the  likelihood 
ratio  obtained  in  Section  3.4  above.  One  of  these  algorithms,  which  will 
eventually  be  denoted  as  Version  I,  requires  only  knowledge  of  the  noise 
covariance  matrix;  it  is  completely  adaptive  to  the  signal-plus-noise  process 
Moreover,  it  is  easily  implemen table.  Version  I  thus  satisfies  all  four  of 
the  above  criteria. 

Version  II  algorithm  requires  prior  knowledge  of  a  two-variabie  function 
(the  drift  of  a  diffusion).  If  a  model  giving  this  function  is  not  available 
then  the  function  can  be  estimated  from  a  "training"  ensemble  of  signal-plus- 
noise  data.  Thus  it  is  not  adaptive  and  is  more  difficult  to  implement,  but 
can  be  expected  to  generally  perform  better  than  Version  I  when  it  can  be 
implemented . 
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One  further  criterion  will  also  be  imposed  on  the  implementation:  a 
par tial ly-recursive  formulation. 

We  now  proceed  to  the  development  of  these  two  detection  algorithms. 
Version  I,  fully  adaptive,  is  based  on  the  following  additional  assumptions: 

(A.l)  The  noise  process  has  multiplicity  M=l,  and  the  process  (B^(t))  is 
the  standard  Wiener  process  (W(t));  thus  N(t)  =  Xq  F(t,s)dW(s), 
where  F  is  a  Volterra  kernel  with  Xq  Xq  F^(t,s)dsdt  <  “>; 

(A. 2)  The  signal-plus-noise  process  can  be  represented  as  (Y(t)  = 

Xq  F(t,s)dZ(s),  where  the  process  (Z^.)  is  a  diffusion  with  respect 

A 

to  a  Wiener  process  W  and  has  memoryless  drift  function,  so  that 

Z(t)  =  rC  0[Z(s)]ds  +  W(t),  (5.1) 

J0 

where  P{<j:  Xq9^[Z(<j,  s)]ds  <  “>}  =  1. 

The  second  algorithm,  Version  II.  assumes  (A.l)  above  and  that  the  signal- 
plus-noise  process  can  be  represented  as  Y(t)  =  Xq  F(t,s)dZ(s),  where 

t 

(A. 3)  Z(t)  =  X  ct[s.  Z(s) ]ds  +  W(t),  (5.2) 

0 

*  12  2 
where  again  W  is  a  Wiener  process,  and  P{u:  SqO  [s,  Z(w,s)]  ds  <  °°}  =  1 . 

The  assumption  (A.l)  is  reasonable  from  several  viewpoints,  such  as  the 

fact  that  all  stationary  processes  and  all  discrete-time  processes  are  of  unit 

multiplicity  and  that  any  Gaussian  vector  can  be  represented  as  the  result  of 

a  lower- triangular  matrix  operating  on  white  Gaussiaui  noise.  It  is  also  known 

[24]  that  unit  multiplicity  processes  are  dense  (in  L2[dPdt])  in  the  class  of 

m.s.  continuous  processes.  One  can  show  that  the  assumption  (A.l)  is 

satisfied  whenever  the  noise  process  has  a  proper  canonical  representation 


Book  -  2/14/90  -  25 


N(t)  =  F(t,s)dB(s),  where  the  variance  of  (B(t))  is  an  absolutely 
J0 

continuous  function  on  [0,1]. 

To  clarify  the  significance  of  the  assumptions  (A. 2)  and  (A. 3),  it  is' 
necessary  to  review  well-known  material  on  the  representation  of  processes 
(Zt)  such  that  «  P^.  From  Theorem  7.11  of  [9],  any  such  (Zt)  must  be  a 
process  of  "diffusion  type": 

t 

Z(u,t)  =  /  -rfs.  Z(4>.‘)]ds  +  W(u,t)  (5.3) 

0 

A 

where  (W^)  is  a  Wiener  process  adapted  to  ct( Z),  and  -r  is  a  function  on 
[0,l]xC0[0.1]  such  that 

i)  for  all  s  in  [0,1],  'r(s.x)  does  not  depend  on  x(t)  for  any  t  >  s; 

ii)  P{(j:  /q't^s ,Z(<*),  *  )]ds  <  ®)  =  1. 

Using  the  results  of  Theorem  3.2  above,  it  can  be  seen  that  assumption  (A. 3) 
reduces  to  the  assumption  that  the  function  t  is  memory less.  Of  course,  there 
is  a  very  large  class  of  processes  such  that  this  assumption  is  satisfied. 

Note  that  this  is  emphatically  not  equivalent  to  the  assumption  that  the 
observed  signal-plus-noise  process  is  the  solution  to  a  stochastic 
differential  equation.  To  be  precise,  given  that  assumption  (A.l)  is 
satisfied,  the  assumption  (A. 3)  states  that  the  signal -plus-noise  process  can 
be  represented  as  a  filtered  diffusion. 

Assumption  (A. 3)  is  of  course  much  weaker  than  (A. 2);  the  latter  assumes 
not  only  that  the  process  Z  defined  above  is  a  diffusion,  but  that  the  drift 
function  is  time-homogeneous:  t[s,  Z(u,*)]  =  0[Z(u,s)]  a.e.  dP(w)ds. 

The  reduction  of  the  problem  to  the  class  of  processes  satisfying  (A. 2) 
is  motivated  by  the  goal  of  developing  a  likelihood- ratio-based  detection 
algorithm  that  can  operate  without  any  prior  knowledge  of  the  signal 
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properties:  a  completely  adaptive  algorithm. 

An  algorithm  will  now  be  described,  assuming  (A.l)  and  (A. 3),  and  uniform 
sampling  of  the  continuous-time  waveform.  For  the  detection  problem  as 
defined  above,  applying  Theorem  3.7  and  known  results  for  the  Wiener  process 
(see.  e.g. ,  [9]  or  [8]).  the  general  form  (under  a  mild  restriction)  of  the 
likelihood  ratio  on  LgPO.l]  is 

^S+N/diJN^x^  =  lim  exp  I>n(6n(ml>]»] 

n 

where  0  =  t”  <  t”  <  tg  <  ...  <  t"  =  T  is  a  partition  of  [O.T]  such  that 

sup  lCj+l  ~  "*  °-  5n(x)  =  (x(tj)*  x(c2) . x(  t^))  •  31111 

J 

n~l 

An[6n[n,(x)]]  =  ^  a(i.  m[x](t*))(m[x](t"+1)  -  m[x](t^))  (5.4) 

-  (1/2)V  a2(i.  m[x](t"))(t^+1  -  t^), 

m  is  defined  (as  m)  in  part  (2)  of  Theorem  3.7,  and  the  limit  exists  in  the 
norm  of  Lj[p^]. 

It  should  be  noted  that  this  approximation  does  not  arise  from  sampling 
of  the  continuous-time  observation.  The  situation  is  much  more  complicated; 
the  approximation  is  obtained  by  sampling  of  the  continuous- time  function 
m(x),  where  x  is  the  observation.  The  difficulty  is  that  m  will  generally  not 
be  known. 

The  representation  of  (N(t))  by  N(t)  =  PC  F(t,s)dW(s)  yields  that 

J0 

M  * 

Rjj  =  FF  ,  where  F  is  the  integral  operator  with  F(t,s)  as  its  kernel,  and  F 
is  its  adjoint.  This  can  be  used  to  provide  an  approximation  to  the  function 
m(x)  appearing  in  (5.4)  that  does  not  require  calculation  of  eigenvalues  and 
eigenvectors. 
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T  r 

First,  notice  that  <e.,f  >  =  P  P  F(s,u)du  e  (s)ds  = 

J  t  J0  Jq  j 

rC  rT  F(s,u)e,(s)dsdu  =  [LF*e.](t),  where  [Lf](t)  =  P*  f(v)dv.  L  is  considered 
J0  J0  J  J  J0 

as  a  map  from  l^CO.l]  into  C[0,1].  Using  this,  the  expression  for  m  given  in 
(2)  of  Theorem  3.7  can  be  rewritten  as 

h  k  _  i 

m[x](t)  =  lim  [LF  2  <e.,x>  R-,e  ](t)  (5.5) 

k-*»  1  J  W  J 

=  lim  [LF\1P  x](t) 
k-*»  1  K 

where  P^x  is  the  projection  of  the  function  x  on  the  subspace  in  LgPO.l] 

—1  M _ 1  — J 

spanned  by  (e^ . e^} .  ^*nce  ^  =  F  F  the  preceding  becomes  m[x](t)  = 

lim  [LF  *P,x](t). 

kH<x>  k 

A  basic  difficulty  is  that  (with  probability  one  [10])  the  observation  x 
will  not  be  in  the  domain  of  the  operator  F  1 .  so  that  F  *x  is  not  defined.  In 
fact,  LF  *  will  in  general  not  be  a  bounded  linear  operator.  However,  for 
almost  all  sample  functions  x  (either  from  noise  or  signal-plus-noise), 
m[x](*)  is  a  continuous  function  on  [0,1].  Thus  the  map  m  is  a  linear  operator 
from  L2[0,l]  into  C[0,1]  whose  domain  includes  (with  probability  one)  all 
sample  functions  of  the  noise  and  signal-plus-noise  processes. 

The  difficulty  in  implementation  of  the  approximate  likelihood  ratio 
(5.4)  will  lie  in  determining  the  function  a  and  linear  operator  m.  a  is  a 
parameter  of  the  signal-plus-noise  process,  and  its  estimation  is  a  problem  of 
considerable  interest  in  stochastic  processes  (as  the  drift  of  a  diffusion) 
and  in  stochastic  filtering.  The  possibly  unbounded  linear  operator  m,  mapping 
Lg^.l]  into  C[0,1],  depends  only  on  the  covariance  function  of  the  noise.  If 
a  is  known  or  estimated,  and  if  the  noise  covariance  function  is  completely 
known,  including  its  orthonormal  eigenvectors  and  associated  eigenvalues,  then 
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the  preceding  expressions  can  be  used  to  obtain  a  discrete-time  finite-sample 
approximation  to  the  likelihood  ratio.  Here  we  consider  such  approximations 
when  one  knows  only  the  covariance  matrix  of  the  noise. 


F*.  where  the  matrix  F  is  lower  triangular  and  An  is  the  sampling 
ri'— n  - n 

interval  (uniform  sampling).  Now,  the  expression  for  m  given  above  is  of  the 
form 

=  lim  [LF  XP  x](t). 
k-*»  K 


where  =  FF  ,  L  is  the  integration  operator,  and  is  the  projection  of  x 

onto  the  subspace  spanned  by  {e. . e.}.  w^ere  {e  ,  n£l)  are  o.n. 

eigenvectors  of  R^.  Thus,  a  reasonable  procedure  is  simply  to  replace  this 

expression  by  mn[xn]  =  L  F  *xn.  where  xn  is  the  observed  data  vector,  an 

element  of  IRn,  and  L  is  the  summation  operator  in  Rn;  (L  x11)  .  =  2"?  ,  x.  . 

~n  ~n—  j  i=l  1 

However,  without  further  analysis  and  justification  this  would  simply  be 
an  ad  hoc  assumption.  Thus,  we  now  examine  the  relation  of  L  F  *xn  to  m(x) , 
where  x11  =  (x(An),  x(2An) . x(nAn)). 


By  the  Corollary  to  Theorem  3.7,  (mo$^)(x)  =  x  a.e.  dP^(x) ,  where  is 

defined  in  Theorem  3.1.  Hence,  the  distribution  of  m(N)  is  given  by  P^,  so 

that  the  vector  6n[m(N)]  =  (m(N)(An),  m(N)(2An) . m(N)(nAn))  has 

probability  distribution  P^oS  *,  where  6  :  C[0,1]  -»  Rn  is  defined  by  6  (x)  = 

w  n  n  n 

(x(An),  x(2An) . x(nAn)).  Similarly,  defining  mn(xn)  =  L^F^x11,  mn(Nn) 

has  probability  distribution  P^°5n^,  from  the  definition  of  F^.  Thus,  in  the 
preceding  expression  (5.4)  for  An[6n[m(x)]] ,  and  setting  t”  =  iA°,  one  can 
replace  5n[m(x)]  by  rnn[xn]:  with  respect  to  p^,  the  law  of  An[mn(xn)]  will  be 
the  same  as  that  of  An[5n[m(x)]] . 
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Next,  suppose  that  the  signal  is  present.  In  examining  the  relation 
between  the  law  of  An[6n[m(x)]] ,  as  given  by  (5.4),  and  the  law  of  An[mn[xn]], 
obtained  by  substituting  mn(xn)  f.or  Sn[m(x)],  we  make  the  following  smoothness 
assumptions: 

(a)  En( i , j )  =  F(iAn.  jAn)  for  all  i.j  *  n; 

(b)  The  law  of  the  random  vector  in  En  with  (i+1)  component  given  by 

iAn  A 

Jcr[s,Z(s)]ds  +  W([i+1]A  )  is  approximately  the  law  of  the  random  vector  in 
0 

i 

IRn  with  (i+1)  component  given  by  An  2  a[kAn, Z(kAn) ]  +  W([i+l]An),  where 

k=l 

t 

Z(t)  =  S  ct( s.Z(s))ds  +  W(t) 

0 

as  in  assumption  (A. 3). 

Assumption  (a)  is  essentially  equivalent  to  assuming  that 


(iAj)A 


n 


S  F(iAn,s)F(jAn.s)ds  =  An 


2  F(iA?  kAn)F(jAn,  kAn) 
k=l 


for  every  i.j  <  n.  It  is  thus  an  assumption  on  the  smoothness  of  (F(t,*),  t 
in  [0,1]}.  Note,  however,  that  the  smoothness  requirement  applies,  for  fixed 
t,  only  to  F(t,*)  restricted  to  [O.t],  Similarly,  (b)  amounts  to  a  smoothness 
assumption  on  a. 


,n 


kA 

Under  Pg^,  letting  y  be  the  vector  such  that  yn(k)  =  S  F(kAn,  s)dZ(s) , 


nr  nn  ,  1  n 

m  [x  ]  =  L  F  v  . 

-  l-  j  -n— n  + 


Assumptions  (a)  and  (b)  then  give  that  the  law  of  mn[xn]  is  approximately  that 
of  6n[m(x)],  with  the  law  of  6n[m(x)]  being  approximately  that  of  the  random 
vector  with  (i+1)  component 
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i 

An  2  a[kAn.  Z(kAn)]  +  W((i+l)An). 
k=l 

Thus,  with  the  smoothness  assumptions  (a)  and  (b).  An[6n[m(x)]3  and 
An[mn[xn]]  have  approximately  the  same  distribution  w.r.t.  jig^.  From  the 
nature  of  assumptions  (a)  and  (b) ,  it  can  be  seen  that  (if  F  and  a  are 
sufficiently  smooth)  the  approximations  can  be  expected  to  become  better  (for 
a  fixed  observation  time)  as  n  increases  (An  decreases). 

We  thus  have,  under  the  assumption  that  a  is  known,  and  with  the 
smoothness  assumptions  on  F  and  a.  an  approximation  to  the  discretized 
continuous- time  likelihood  ratio.  The  probability  of  false  alarm  (Pp^) 
calculated  under  this  approximation  will  be  exactly  that  which  one  would 
obtain  with  a  discretized  version  of  the  exact  continuous- time  likelihood 
ratio.  The  probability  of  detection  (P^)  will  be  an  approximation  to  that 
which  would  be  obtained  using  a  discretized  version  of  the  exact  continuous¬ 
time  likelihood  ratio. 

In  most  applications,  of  course,  a  will  not  be  known.  We  now  describe 
two  approaches  to  obtaining  an  estimate  of  a,  corresponding  to  the  two 
assumptions  (A. 2)  and  (A. 3),  with  both  based  on  replacing  6n[m[x]]  with  mn[xn] 
in  the  expression  (5.4)  for  the  discretized  log-likelihood  ratio. 

First,  suppose  assumption  (A. 3)  is  satisfied,  and  that  a  training 
ensemble  of  representative  signal-plus-noise  data  is  available,  consisting  of 
K  vectors  (x* ,  i  £  K) ,  each  having  n  components,  with  x*(j)  =  x*(jA).  It  is 
assumed  that  the  vectors  represent  independent  samples.  One  first  applies  the 
matrix  F^  to  each  element  of  this  ensemble.  Under  the  assumptions  (a)  and 
(b)  above,  this  gives  the  ensemble  of  vectors  (6Z* ,  i  £  K) ,  where 

(6Zi)(j)  =  Z*[( j+l)An]  -  Zi[ JAn] 

=  Ano[J.  Z^JA")]  +  Wi[{ j+l)An]  -  Wi[jAn]. 
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One  now  fixes  j,  and  uses  the  K  values  having  the  above  expression  (i  £  K)  to 
estimate  o(j.#).  Various  procedures  can  be  used  to  carry  out  the  estimation; 


note  that  defining  (6Wi)( j)  s  ^[(j+lJA11]  -  W^JA11].  the  set  {(SW1)^).  i  <  K} 


consists  of  i.i.d.,  random  variables,  with  each  Gaussian,  mean  zero,  variance 
An. 

Now.  with  this  estimated  a  inserted  into  the  expression  (5.4)  for  An,  a 
sample  vector  xn  is  observed.  mn[xn]  is  then  formed,  and  used  in  the 
expression  for  An  to  form  the  test  statistic  An[mn[xn]]. 

If  a  representative  training  ensemble  of  signal-plus-noise  data  is 
available,  or  if  a  is  known  from  a  mathematical  model,  then  the  above 
procedure  gives  the  preferred  mechanization.  The  algorithm  employing  this 
estimate  of  a  (or  using  a  known  a)  will  be  denoted  as  Version  II.  In  the  case 
of  non-stationary  signal  and  noise,  obtaining  an  ensemble  of  S+N  data, 
properly  aligned,  can  be  expected  +o  be  difficult.  However,  if  the  signal- 
plus-noise  is  a  stationary  process,  then  one  may  opt  to  use  a  long  segment  of 
S+N  data  to  estimate  a  time- invariant  a;  this  segment  could  then  be  much 
longer  than  the  observation  time  over  which  the  detection  algorithm  is  to 
perform.  It  can  be  shown  [14]  that  use  of  a  time-varying  a  gives  an  exact 
likelihood  ratio  for  the  discrete-time  problem  if  the  signal-plus-noise 
process  is  Gaussian. 

Suppose  next  that  nothing  is  known  about  the  properties  of  the  signal- 
plus-noise  process,  and  that  an  ensemble  of  training  data  is  not  available. 
Version  I  of  the  algorithm  (5.4)  is  then  implemented  as  follows,  for  a  fixed 
value  of  n:  The  observed  sample  vector  xn  is  first  used  to  estimate  a  time- 
homogenous  a;  the  estimated  a  is  inserted  into  the  expression  (5.4)  for  An, 
and  then  An[mn(xn)]  is  evaluated.  Thus,  this  version  of  the  algorithm 
corresponds  to  assumption  (A. 2). 
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The  estimation  of  a  is  made  from  the  single  observed  sample  vector  x11 
which  is  to  be  tested  for  the  presence  of  a  signal.  One  applies  to  xn; 
assuming  that  xn  represents  signal-plus-noise,  this  yields  a  vector  <5Z,  which 
has  the  representation  (under  assumptions  (a)  and  (b) ) 

(6Z)(j+l)  =  Ano[Z(jAn)]  +  (5W)(j+l). 

(5W)(j+l)  =  W[( j+l)An]  -  W[jAn]. 

A 

The  elements  of  (5W(j),  j  i  K}  are  i.i.d.  random  variables,  with  each 
Gaussian,  mean  zero,  and  variance  An. 

The  preceding  discussion  will  now  be  summarized.  First,  a  is  either 
known  or  else  is  estimated  by  one  of  the  two  procedures  described  above  when 
the  observation  is  an  n-component  vector  x11.  The  test  statistic,  an 
approximation  to  the  continuous-time  log-likelihood  ratio  under  the  assumption 
that  the  noise  has  multiplicity  M  =  1.  then  has  the  expression  (with  the 
definition  of  An  slightly  changed  from  (5.4)) 


-  (An/2)  V  o2(j,  [(l  rV)  ] 

j=0  J 


(5.6) 


n-1 


--In, 


n-1 


.2,  . 


If  now  a  new  data  point  x^+j  is  observed,  the  approximation  has 
form 


=  An(x)  ♦  o[n,  (LF ;'x)n](C1^)n+1 


-  (An/2)  a2[n,  (L  F_1x)n]. 


the  recursive 

(5.7) 
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The  above  procedure  requires  relatively  few  additional  operations  when  a 
new  data  point  is  observed.  The  implementation  and  calculation  of  A  require 
the  following  operations.  First,  the  function  a  must  be  known  and  programmed 


or  estimated  from  the  observation.  Given  the  value  of  An(xn)  and  the  observa¬ 
tion  xn  =  (x, . x  ),  one  stores  An(xn),  xn,  o[n.  (L  F  *xn)  ],  and 

fL^F^x11)^.  When  the  data  point  xn+j  is  received,  it  is  only  necessary  to  use 
the  vector  xn+*  to  calculate  (F  \xn+M  ...  which  means  to  cross-correlate  the 

observation  vector  xn+*  with  row  n+1  of  F  .  This  number,  say  b  , ,  is  then 

-  -n+1  J  n+1 

used  to  form  An+*(xn+*), 


An+1(xn+1)  =  An(xn)  +  a[n.  2  b.]bn+1  -  (An/2)  o2[n,  2  b.]  (5.8) 

i=l  i=l 

Throughout  this  chapter,  we  have  made  the  assumption  that  the  noise 

covariance  is  known.  One  then  knows  (F  .  n  >  1},  and  thus  (F  * ,  n  £  1).  As 

mentioned,  each  new  observation  of  a  data  point  requires  only  cross- 

correlation  of  the  observed  vector,  an  element  of  IRn+*.  with  row  n+1  of  F  *  . 

-n+1 

It  is  not  necessary  to  apply  the  matrix  F  ^  to  the  data  vector. 

Implementation  of  the  recursive  form  of  the  algorithm  is  done  most 
conveniently  when  a  is  known,  or  when  a  training  ensemble  of  S+N  data  can  be 
used  to  estimate  a.  If  one  must  estimate  a  from  the  observed  data  (Version  I 
algorithm),  then  the  recursive  formulation  given  above  will  need  modification. 
Various  approaches  can  then  be  used  for  updating  the  estimate  of  a,  depending 
on  the  amount  of  storage  available,  etc. 

The  performance  of  the  algorithm  can  be  expected  to  depend  not  only  on 
the  properties  of  the  data,  but  also  on  the  sampling  rate  and  the  choice  of 
the  specific  estimation  procedure  for  estimating  a.  Thus,  implementation  for 
a  particular  application  should  be  preceded  by  an  extensive  study  featuring 
both  simulated  and  experimental  data. 
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4.  DETECTION  IN  GAUSSIAN  MIXTURE  NOISE 


4.1.  Introduction 

In  this  section,  the  results  of  the  preceding  section  are  extended  to 

nonGaussian  noise  processes  (Nt)  having  a  representation  N((d,t)  =  A(<j)G(w,t) 

a.e.  dP(<j)dt  where  (Gt)  is  a  m.s.  continuous  Gaussian  process,  vanishing  a.s. 

at  t  =  0,  and  of  finite  Cramer-Hida  multiplicity,  and  A  is  a  strictly  positive 

random  variable  that  is  independent  of  (Gt).  Such  noise  will  be  termed  a 

2 

"Gaussian  mixture”;  if  also  EA  <  “,  then  (Nt)  is  spherical ly-invariant . 

As  in  the  preceding  section,  we  will  first  discuss  absolute  continuity 
and  representation  of  the  likelihood  ratio  for  the  general  continuous-time 
problem.  This  will  be  followed  by  the  description  of  a  discrete-time 
approximation  of  the  continuous- time  likelihood  ratio,  and  its  implementation. 

Although  the  noise  process  is  a  Gaussian  mixture,  the  signal-plus-noise 
process  need  not  be  of  this  type.  Results  on  absolute  continuity  and 
likelihood  ratio  when  both  processes  are  spherical ly-invariant  are  given  in 
[25],  [26]. 

A  general  analysis  of  the  problems  of  absolute  continuity  and  likelihood 
ratio  for  Gaussian  mixture  noise,  with  applications  to  the  Shannon  information 
of  communication  channels,  is  contained  in  [27]. 

4.2.  Absolute  Continuity 

The  proofs  for  the  results  to  be  given  closely  follow  those  for  the  case 
where  (Nt)  is  Gaussian.  In  the  latter  case,  the  problem  of  absolute 
continuity  was  solved  by  the  following  general  approach.  First,  it  was  shown 
that  versions  (and  thus  their  distributions)  of  the  noise  and  signal-plus- 
noise  processes  could  be  obtained  by  applying  the  same  measurable 
transformation  (the  $  function  of  Theorem  3.1)  to  a  Gaussian  vector  martingale 
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and  to  a  process  which  resembles  a  process  of  "diffusion  type"  [17].  Then, 
available  results  on  the  Wiener  process  were  exploited  to  obtain  equivalence 
for  the  measures  induced  by  the  latter  pair  of  processes  on  C^[0,1].  These 
results  were  then  reflected  back  to  the  original  processes  to  obtain  the 
results  on  absolute  continuity.  The  existence  of  the  measurable 
transformation  with  the  properties  given  in  Theorem  3.1  does  not  require  that 
(Nt)  be  Gaussian.  Thus,  to  extend  the  results  on  absolute  continuity  from  the 
Gaussian  (Nt)  to  that  of  a  Gaussian  mixture  (N ^ ) ,  one  must  only  show  that  the 
martingale  results  used  in  the  Gaussian  case  can  be  extended  to  the  Gaussian 
mixture  problem.  The  Gaussian  martingale  results  rely  on  Girsanov’s  theorem, 
which  can  be  summarized  as  follows.  If  <Mt)  is  a  continuous  local  martingale, 
with  natural  increasing  process  (<M>t),  and  if  (ft)  is  a  predictable  process, 
then  (Mt)  defined  by 

t 

M  =  J  f (s)d<M>  +  M 

t  o  st 

is  a  continuous  local  martingale  with  (<M>t)  as  its  natural  increasing 
process,  with  respect  to  the  measure  0  defined  bv 

=  exp/-/  fdM  -  'A  S  f2d<M> 

{  0  0 

and  assuming  that  Q  is  a  probability. 

When  (M^)  is  a  Gaussian  martingale,  its  continuity  and  the  form  of  (<M>t) 

suffice  to  completely  specify  the  law  induced  by  (M t ) :  it  is  a  transformation 

of  Wiener  measure.  This  is  no  longer  true  when  (<M>t)  is  random.  In  fact, 

the  identification  of  the  law  of  (M t )  with  the  knowledge  of  (<M>t)  is  a 

difficult  unsolved  problem  [28].  In  the  case  of  Gauss ian-mixture  noise  (ABt). 

2 

where  (St)  is  a  vector  Gaussian  martingale,  then  <AB>  =  A  Tg,  where  Tg  is  the 
covariance  matrix  of  B,  and  it  can  be  shown  [27]  that  <AB>  determines  the  law 
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of  AB.  This  identification  is  done  in  two  steps.  First,  one  shows  that  if  a 

2 

continuous  local  martingale  M  is  such  that  <M>  =  A  a  covariance  matrix 

and  A  a  strictly-posi tive  random  variable,  then  M  =  AB  where  B  is  a  Gaussian 
martingale  independent  of  A.  Secondly,  one  shows  that  the  law  of  A  with 
respect  to  Q  (defined  above)  is  the  same  as  the  law  of  A  with  respect  to  P. 

Prop.  4.1.  If  B  =  M/A,  then  B  is  a  Gaussian  martingale  independent  of  A. 

This  result  is  clear;  independence  of  A  and  B  follows  from  the  fact  that 
A  is  measurable  w.r.t.  each  o-field  contained  in  the  underlying  filtration. 

Prop .  4.2.  If  U  is  a  hounded  function  measurable  w.r.t.  all  the  o-fields 
of  the  underlying  filtration,  then  EqU  =  EpU. 

This  result  follows  from  EqU  =  EpUDt>  where  (D^)  is  the  martingale 

obtained  by  taking  the  conditional  expectation  of  the  Radon-Nikodym  derivative 

D  =  dQ/dP.  lim  D  =  1  in  L.[dP];  one  successively  applies  this  equality  to 
tiO 

the  characteristic  functions  of  A  and  H. 

The  results  to  be  given  now  are  based  on  martingale  arguments  that  rely 
on  Prop.  4.1  and  Prop.  4.2.  The  next  result  is  the  generalization  of  Theorem 
3.3. 


Theorem  4.1.  Suppose  that  (Yfc)  is  a  stochastic  process  adapted  to 

a( A)  v  ct(G)  v  ct(V),  where  (V is  any  process  independent  of  (G^)  and  A. 

(1)  If  Y( t)  =  S(t)  +  N( t )  a.e.  dP  for  each  fixed  t,  where  ( )  is  a 
stochastic  process  adapted  to  a(N)  v  o(V)  and  with  almost  all  paths 
in  H^,  then  Vy  <<  v^. 

(2)  If  (1)  holds,  and  also  Y( t)  =  S(t)  +  N(t)  a.e.  dPdt ,  then  Py  «  p 
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As  can  be  seen,  the  above  sufficient  conditions  for  existence  of  the 


likelihood  ratio  are  very  similar  to  those  of  Theorem  3.3.  Similarly,  the 

representations  of  the  likelihood  ratio  when  these  conditions  are  satisfied 

are  identical  to  that  given  in  Theorem  3.7.  Those  representations  are  given 

in  [27].  In  order  to  completely  describe  the  likelihood  ratio,  it  is 

necessary  to  give  the  likelihood  ratio  dP^/dP^g,  where  (B c )  is  a  Gaussian 

vector  martingale,  A  is  a  strictly  positive  random  variable  independent  of 

(B  ) ,  and  (Z  )  is  a  process  with  paths  in  C^[0,1]  and  such  that  P_  <<  PD.  The 
t  t  U  Z,  p 

likelihood  ratio  dP^/dP^g  is  given  in  [27],  along  with  the  appropriate 
definitions.  The  derivation  and  consequent  statement  of  that  general  result 
is  technically  somewhat  involved,  and  will  be  omitted.  Instead,  we  make  a 
stronger  set  of  assumptions  that  will  suffice  for  eventual  implementation, 
similar  to  the  results  given  in  [16]. 

Before  giving  the  result  on  likelihood  ratios  under  these  stronger 
assumptions,  we  state  an  important  result  that  is  essential  to  eventual 
implementation  of  the  approximated  likelihood  ratio.  In  Section  3.3, 
likelihood  ratios  (Theorem  3.7)  were  stated  in  terms  of  a  function  m  having 
range  in  cJJ[o.i]  and  domain  in  either  IR^*^  or  L^O.l].  The  same  functions 
will  occur  in  the  Gaussian  mixture  problem,  but  now  one  must  consider  the 
presence  of  the  random  variable  A.  This  is  simplified  by  the  following 
resul t . 

Prop.  -4.3.  Let  m  :  1R^’^  -*C^[0,i]  be  the  m  defined  in  Theorem  3  6, 

when  A  =  a.  Let  m /:  I^O.-l]  -»  C^[0,1]  be  the  function  m  defined  in  (2) 

of  Theorem.  3.7,  when  A  =  a.  Then  m  and  m’  are  independent  of  the  ualue 

-a  ~a 

of  a. 

With  Prop.  4.3,  likelihood  ratios  can  be  obtained.  The  following  result 
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is  the  equivalent  of  Theorem  3.7. 


Theorem  4.2.  Suppose  that  ( )  is  a  stochastic  process  independent  of 
(N  ) ,  and  that  (St)  is  a  stochastic  process  adapted  to  q(N)  v  o(V)  with 
almost  all  paths  in  H„.  Then  S.  =  l/f  . S^F. ( t , s)$. (s)dp . (s) .  where 

u  C  1=1  U  l  L  l 

(*t(s))  is  g(N)  v  q(V)-predictable  and  has  paths  a.s.  in  Define 

(Zfc)  to  be  the  process  with  it h  component  given  by  Z^(t.u)  = 

j‘*.(s.W)d0.(S)  +  AB.(t,u).  Pg  «  and  St  +  Nt  =  t .  s)dZ.(s) . 

Then 


(1)  Suppose  Y  =  Sfc  +  a.e.  dP,  for  each  fixed  t  in  [0,1].  Then 
[di)y/duw](y)  =  [dPz/dPAB](my)  a.e.  du^(y),  where  m  is  defined  as  in 
Theorem.  3.6. 

(2)  Suppose  Y  =  St  +  a.e.  dtdP.  Then  [dp^,/dp^](x)  = 

[dP^VLP^g] (mx)  a.e.  dp^(x),  where  m(y)  in  has  as  its  ith  component 

[m(y)],(t)  =  2  N,<y,e  ><f*  e  >/X  . 
i — vMjjjv  y  a>i  a  n  t  n  n 

(3)  Suppose  that  Yfc  =  Sfc  +  a.e.  dP  for  each  t  in  [0,1],  and  that 
(Y  )  has  continuous  paths.  Then  [dAy/d?^](x)  =  [dP^/dP  (mi  qX)  a.e. 
dXpj(x) ,  where  m  is  defined  in  (2)  and  tg'-  C^[0,1]  -*  L^O.l]  is  the 
injection  map. 


The  following  result  is  a  special  case  of  Theorem  4.1  and  Theorem  4.2. 


Theorem  4.3.  Let  (V  )  be  any  process  independent  of  (N t) .  Suppose  that 
(Y  )  is  a  stochastic  process  having  the  representation  Yfc  =  Sfc  +  a.e. 

dPdt  where  (S^)  is  adapted  to  q  (N)  v  a  (V).  For  fixed  a,  let  (S^)  be 
the  process  (S^)  when  A  is  fixed  and  A  =  a,  and  set  Y^  =  S^  +  aC^ . 
Suppose  that  for  each  fixed  a  in  range ( A) .  (S^)  has  almost  all  paths  in 
H„.  Finally,  suppose  that  A  is  a  discrete  random  uarlable  with  range  in 
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(a^ ,  ill).  Then 

(a)  Uy  •<<  Vpj  and  v  <<  v ^  for  every  a  in  (a^ ,  i  >  1}; 


dvy 

(b)  =  2  I  (x) 

N  ill  1 


dv 


a. 

S  l+a.G 

i 


dv 


a.G 

i 


a.e.  dv^(x). 


where  I  denotes  the  indicator  function  and  C(a^)  = 

_  .  n  <x,e  >  0 

{x:  lim  —  2  — r— > * —  =  a:}, 

n  n  j=l  Xj  1 


Using  Theorem  4.3  and  Prop.  4.3,  one  can  apply  the  results  previously 
obtained  for  Gaussian  noise  to  obtain  the  likelihood  ratio  in  the  Gaussian 
mixture  case.  Theorem  4.3  contains  two  basic  assumptions  that  considerably 
simplify  the  general  result.  One  is  the  assumption  on  smoothness  of  the 
process  (S^)  for  all  a  in  the  support  of  PoA  The  second  is  the  assumption 
(to  give  the  form  of  the  likelihood  ratio)  that  A  has  discrete  support. 
Neither  is  necessary,  and  neither  is  made  in  [27],  where  the  general  results 
are  obtained.  However,  they  are  reasonable  for  applications,  which  is  the 
eventual  aim  here.  Note  that  under  the  assumptions  of  Theorem  4.3, 


n 


A2(u)  =  lim  —  2  <x(w),e  ,>2/X . 

n-*»  n  j=l  J  J 


with  probability  one  under  both  P^  and  Pg+^.  by  the  law  of  large  numbers;  in 
the  case  of  Pg+N»  one  uses  the  fact  that  (S£)  has  paths  a.s.  in  H^.This 
observation.  Theorem  4.3,  and  Prop.  4.3  permit  one  to  specify  a  likelihood 
ratio  for  implementation  quite  naturally,  as  seen  in  the  following  section. 


4.3.  Approximation  and  Implementation  of  the  Likelihood  Ratio 

Here  we  shall  make  assumptions  similar  to  (A.l).  (A. 2)  and  (A. 3)  of 
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2 

Section  3.4.  We  also  assume  EA  =  1 ,  so  that  =  R^,  and  that  A  is  a 
discrete-random  variable. 

Version  I  of  the  algorithm  is  based  on  assumptions  A.l  and  A. 2  below. 

(A.l)  The  Gaussian  process  (Gt)  defining  the  Gaussian  mixture  noise  process 
has  multiplicity  M  =  1,  and  the  process  (B ^ { t ) )  is  the  standard  Wiener 
process,  (Wt).  Thus 

N(t)  =  AG( t)  =  A  J*  F(t.s)dW(s). 

with  F  a  Vol terra  kernel  such  that  J*Qj‘QF^(t,s)dsdt  <  “>. 

(A. 2)  The  signal-plus-noise  process  (Y^)  is  given  by 

Y(  t)  =  J“q  F(  t ,  s)dZ(s)  , 

where  ( Z )  has  the  representation 

Z{ t)  =  Jq  0[Z(s)]ds  +  AW(t) 

a.e.  dP  for  each  t;  and  P{u:  ,fQ0^[Z((i), s)]ds  <  °°}  =  1. 

Version  II  is  based  on  A.l  and  on  A. 3.  below. 

(A. 3)  Y(t)  has  the  representation  given  in  (A. 2),  but  now 

Z(t)  =  Sq  cr[s,  Z(s)]ds  +  AW(  t) 

where  P{u:  J*qCT^[s,  Z(<j,s)]ds  <  ®}  =  1. 

Implementations  of  the  discrete-time  approximation  to  this  likelihood 

ratio  are  obtained  from  those  given  in  Section  3.4  for  the  Gaussian  noise 

case,  as  follows.  Here  we  assume  the  notation  of  Section  3.4:  F*. 

"ti  — n- n 

and  is  the  summation  matrix.  An  is  the  (uniform)  sampling  interval,  and  we 
assume  an  observation  in  IRn.  We  first  discuss  the  implementation  of  the 


Book  -  2/14/90  -  41 


algorithm  with  time-homogenous  drift,  with  no  prior  knowledge  of  the  signal- 
plus-noise  properties  (Version  I). 

Step  1.  Form  the  vector  F^x11.  This  gives,  if  signal  is  present  and 

with  the  smoothness  assumptions  (a)  and  (b)  of  Section  3.5,  the  vector 

(5Zn)  where  (SZn)(i+l)  =  Ano[Z(Un)]  +  aW([i+l]An)  -  aW(iAn). 

Step  2.  Estimate  a;  for  example,  by  computing  the  sample  quadratic 

n  Ap  n  1  rt 

variation  of  Z  ;  that  is,  (a)  =  2  [(5Zn)(i)]  . 

i=l 

Step  3.  Assuming  that  (6Zn)(i+l)  =  Ana[Z(iAn)]  +  a[W([i+l]An)  -  W(iAn}] 

for  i  £  n-1,  estimate  a. 

A  ^ 

Step  4.  Insert  a  into  the  expression  (5.4)  of  Section  3,  for  A  ,  and 

then  evaluate  An[mn[xn]]/(a)^. 

Note  that  in  this  Version  I  of  the  algorithm,  the  estimate  of  a  may 
depend  on  the  estimate  of  a.  It  should  also  be  noted  that  the  above 
estimation  procedure  for  a  assumes  large  n  and  small  An. 

In  the  likelihood  ratio  with  time-varying  a,  as  in  assumption  (A. 3),  one 
can  proceed  by  first  estimating  A  for  each  sample  path  of  the  training 
ensemble,  and  then  estimating  cr(i,*)  for  each  i  £  n  by  using  these  estimated 
values  of  A.  Of  course,  a  better  solution  would  be  to  know  a  from  a 
mathematical  model,  if  such  a  model  were  available. 

The  algorithms  given  in  Section  3.4  are  special  cases  of  these 

a 

algorithms,  obtained  by  setting  a  =  1. 
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5. 


COMMENTS  ON  ROBUSTNESS  AND  RANGE  OF  APPLICATIONS 
OF  THE  DETECTION  ALGORITHMS 


The  results  summarized  in  this  chapter,  and  given  in  detail  in  [17]  and 
[27],  give  a  complete  set  of  sufficient  conditions  and  necessary  conditions 
for  the  existence  of  a  likelihood  ratio,  under  the  assumption  that  the  noise 
process  (N^)  is  a  Gaussian  mixture  process  with  the  following  properties. 

(1)  (Nt)  has  a  representation  (AGt),  where  (Gt)  is  Gaussian,  A  is  a  positive 
random  variable  independent  of  (G^).  and  (G^)  is  m.s.  continuous. 

(2)  N(0)  =  0  with  probability  one. 

(3)  (G t )  has  finite  Cramer-Hida  multiplicity. 

These  assumptions  are  very  weak.  (1)  is  typically  satisfied  in  applica¬ 
tions.  (2)  can  be  circumvented  in  applications  by  assuming  that  the  problem 
started  at  a  time  instant  prior  to  the  first  sampling  time.  (3)  is  practical¬ 
ly  meaningless,  since  the  multiplicity  is  permitted  to  be  arbitrarily  large. 

The  adjective  "complete"  in  describing  the  above-mentioned  results  should 
be  interpreted  as  follows.  The  set  of  results  obtained  for  this  problem  are 
exactly  those  that  have  been  known  for  many  years  when  it  is  assumed  that  the 
noise  is  the  Wiener  process.  This  is  all  that  one  can  expect;  the  unusual 
properties  of  the  Wiener  process,  previously  mentioned,  permit  one  to  obtain 
sufficient  conditions  and  necessary  conditions  for  absolute  continuity  when 
the  noise  is  the  Wiener  process.  Noise  processes  encountered  in  applications 
typically  do  not  have  the  fortuitous  properties  of  the  Wiener  process. 

However,  through  use  of  the  spectral  (Cramer-Hida)  representation  of  second- 
order  stochastic  processes,  the  Girsanov  theorem  and  extensions,  and  various 
results  from  stochastic  calculus,  it  has  been  possible  to  obtain  exactly 
comparable  results  for  the  class  of  Gaussian  and  nonGaussian  noise  processes 
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described  above. 


Under  the  same  assumptions,  expressions  for  the  likelihood  ratio  have 
been  obtained  when  the  conditions  sufficient  for  its  existence  are  satisfied. 

These  general  results,  however,  are  only  mathematical  results  without 
further  work.  That  is,  they  must  be  given  an  interpretation  such  that  they 
can  be  implemented  without  requiring  unrealistic  knowledge  of  the  parameters 
appearing  in  the  expressions  for  the  likelihood  ratio. 

Implementation  of  discrete-time  approximations  to  the  exact  continuous¬ 
time  likelihood  ratio  would  require  knowledge  of  the  spectral  (Cramer-Hida) 
representation  of  the  noise.  The  representation  will  typically  not  be  known; 
its  determination  is  well-known  to  be  a  very  difficult  unsolved  problem  (for 
processes  of  a  general  type).  Thus,  additional  assumptions  are  necessary  to 
permit  implementation  of  the  likelihood  ratios  as  signal  detection  algorithms. 
The  basic  additional  assumptions  are  that  the  noise  process  is  of  unit 
spectral  multiplicity,  and  that  the  signal-plus-noise  process  is  a  filtered 
diffusion.  The  first  of  these  two  assumptions  can  be  justified  from  the  fact 
that  unit-multiplicity  second-order  processes  are  dense  (in  a  mean-square 
sense)  in  the  linear  space  of  all  mean-square  continuous  processes  and  that 
all  wide-sense  stationary  processes  and  all  discrete-time  second-order 
processes  are  of  unit  multiplicity,  and  the  second  still  permits  one  to 
consider  a  very  large  class  of  signal-plus-noise  processes  under  the 
assumption  that  a  likelihood  ratio  exists. 

With  these  additional  assumptions,  two  discrete-time  approximations  to 
the  likelihood  ratio  have  been  given.  These  approximations  have  very 
reasonable  implementations,  requiring  prior  knowledge  of  the  noise  covariance 
matrix.  In  the  fully  adaptive  version,  the  remaining  parameter  of  the 
detection  algorithm  can  be  easily  obtained.  For  Version  II,  if  a  good 
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mathematical  model  is  not  available,  then  this  parameter  is  best  obtained  from 
a  representative  sample  of  "training"  signal-plus-noise  data. 

ihe  amount  of  training  data  required  may  be  less  than  that  required  to 
obtain  a  reliable  estimate  of  the  S+N  covariance  matrix;  thus,  the  Version  II 
algorithm  can  be  used  in  some  situations  where  the  S+N  process  is  Gaussian  but 
the  S+N  covariance  cannot  be  reliably  determined.  In  many  applications,  as 
for  example  the  highly-complicated  world  of  underwater  acoustics,  mathematical- 
physical  models  of  important  signal-plus-noise  processes  have  been  sought  for 
many  years,  generally  with  only  limited  success.  Even  when  such  models  are 
derived,  they  may  be  the  result  of  a  great  many  approximations,  and  may 
require  detailed  knowledge  that  is  not  typically  available.  It  is  clearly 
desirable  to  have  detection  algorithms  that  do  not  rely  on  such  models  for 
their  effectiveness.  From  this  aspect,  the  fully  adaptive  Version  I  algorithm 
is  preferable. 

Of  course,  under  the  assumptions  used  here,  implementation  of  the  detec¬ 
tion  algorithms  requires  only  knowledge  of  the  drift  of  a  diffusion.  For  a  par¬ 
ticular  application,  a  successful  modeling  effort  resulting  in  determination 
of  the  drift  function  would  enable  the  implementation  of  the  more  powerful 
Version  II  algorithm  without  the  need  to  have  an  ensemble  of  training  data. 

Numerical  evaluations  of  the  two  algorithms,  using  both  simulated  and 
experimental  data,  have  given  encouraging  results. 

As  can  be  seen  from  the  preceding  development,  the  theoretical  basis  of 
these  two  algorithms  and  their  implemen table  form  give  reasons  to  believe  that 
they  may  provide,  at  least  in  many  applications,  a  satisfactory  solution  to  a 
long-standing  and  much-encountered  detection  problem:  detection  of  Gaussian  or 
nonGaussian  signals  imbedded  in  Gaussian  or  Gaussian  mixture  noise. 
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6.  CONCLUDING  REMARKS. 


Many  discrete-time  finite-sample  detection  algorithms  are  obtained  from 
consideration  of  only  the  discrete-time  (and  f inite-dinieiisional)  problem.  If 
this  is  done,  and  the  data  represents  discretized  continuous-time  data,  then 
the  problem  of  developing  an  optimal ly-ef fective  algorithm  is  akin  to  that 
which  the  blind  man  faces  in  describing  the  elephant.  It  is  obviously 
preferable,  if  possible,  to  develop  a  discrete-time  algorithm  based  on 
approximations  to  the  likelihood  ratio  for  the  continuous- time  problem. 

However,  likelihood  ratios  for  continuous-time  problems  are  inextricably 
bound  (if  done  correctly)  to  the  problem  of  existence  of  the  likelihood  ratio, 
or  absolute  continuity.  Unfortunately,  studies  on  existence  of  likelihood 
ratios  are  considered  to  be  of  only  mathematical  interest  by  many  engineers 
concerned  with  systems  design  and  by  many  applied  statisticians  concerned  with 
inference.  This  attitude  may  well  be  justified  if  the  studies  are  on  models 
not  suitable  for  applications,  or  if  the  results  of  the  study  are  arcane 
theorems  left  in  a  form  unintelligible  to  potential  users,  or  if  the  results 
are  expressions  that  require  knowledge  of  quantities  that  one  cannot 
realistically  expect  to  be  known.  As  can  be  seen  from  the  preceding 
development,  appropriate  studies  on  absolute  continuity  and  likelihood  ratios 
for  continuous-time  problems  can  be  extremely  important  in  developing 
practical  discrete-time  detection  algorithms. 

7.  ACKNOWLEDGEMENTS 

The  mathematical  research  summarized  here  was  supported  by  grants  from 
the  Mathematical  Sciences  Division  of  the  Office  of  Naval  Research  and  by  the 
Networking  and  Communications  Research  and  Infrastructure  Program  of  the 
National  Science  Foundation.  ONR-MSD  also  funded  the  establishment  of  a 


Book  -  2/14/90  -  46 


dedicated  computer  facility;  an  equipment  grant  from  the  Mathematical  and 
Information  Sciences  Division  of  the  Air  Force  Office  of  Scientific  Research 
provided  a  major  upgrade.  Compu cational  work  on  data  analysis,  modeling,  and 
algorithm  evaluations  was  supported  by  the  Signal  Processing  Branch  of  the 
Naval  Oceans  Systems  Center.  We  thank  each  of  these  organizations,  and 
especially  the  cognizant  program  directors,  for  their  support. 


Book  -  2/14/90  -  47 


REFERENCES 


[1]  C.W.  Helstrom.  Statistical  Theory  of  Signal  Detection,  Pergammon,  New 
York  (1960). 

[2]  H.L.  van  Trees,  Detection,  Estimation,  and  Linear  Modulation  Theory,  Part 
I,  Wiley,  New  York  (1968). 

[3]  C.R.  Rao  and  V.S.  Varadarajan.  Discrimination  of  Gaussian  processes, 
Sankhya  Ser.  A.  25.  303-330  (1963). 

[4]  M.  Rosenblatt,  ed.  Proceedings  of  the  Symposium  on  Time  Series  Analysis, 
Wiley,  New  York  (1963).  See  Chapter  11  (E.  Parzen) ,  Chapter  19  (G. 
Kallianpur  and  H.  Oodaira),  Chapter  20  (W.L.  Root),  and  Chapter  22  (A.M. 
Yaglom) . 

[5]  T.E.  Duncan.  Likelihood  functions  for  stochastic  signals  in  white  noise. 
Inf.  Control,  16.  303-310  (1970). 

[6]  T.T.  Kadota  and  L.A.  Shepp.  Conditions  for  absolute  continuity  between  a 
certain  pair  of  probability  measures,  Z.  Wahrscheinlichhei tstheorie  und 
Veno.  Gebi ete,  16.  250-260  (1970). 

[7]  T.  Kailath  and  M.  Zakai .  Absolute  continuity  and  Radon-Nikodym 
derivatives  for  certain  measures  relative  to  Wiener  measure,  Ann.  Math. 
Statist.,  42.  130-140  (1971). 

[8]  J.  Memin.  Sur  quelques  problemes  fondamentaux  de  la  theorie  du  filtrage. 
These  (troisieme  cycle),  U.E.R.  Mathematiques  et  Informatique,  University 
of  Rennes  ( 1974) . 

[9]  R.S.  Liptser  and  A.N.  Shiryayev.  Statistics  of  Random  Processes.  I. 
General  Theory,  Springer,  Berl in-Heidelberg-New  York  (1977). 

[10]  C.R.  Baker.  On  equivalence  of  probability  measures.  Annals  of  Probabil . , 
1.  690-698  (1973). 

[11]  J.L.  Lawson  and  G.E.  Uhlenbeck.  Threshold  Signals,  McGraw-Hill,  New 
York.  pp.  161-163  (1950). 

[12]  C.R.  Baker.  Optimum  quadratic  detection  of  a  random  vector  in  Gaussian 
noise,  IEEE  Trans.  Comm.  Tech.,  14,  802-805  (1966). 

[13]  C.R.  Baker.  On  the  deflection  of  a  quadra tic- linear  test  statistic,  IEEE 
Trans.  Inform.  Theory,  15,  16-21  (1969). 

[14]  C.R.  Baker.  Absolute  continuity  of  measures  on  infinite-dimensional 
spaces.  Encyclopedia  of  Statistical  Sciences,  Vo l.  1,  3-11,  John  Wiley  & 
Sons,  New  York  (1982). 

[15]  C.R.  Baker  and  R.E.  Cunningham.  Results  of  statistical  tests  on  active 
sonar  data,  USN  J.  Underwater  Acoustics,  26,  247-264  (1976). 


Book  -  2/14/90  -  48 


[16]  C.R.  Baker  and  A.F.  Gual tierotti .  Likelihood  ratios  and  signal  detection 
for  nonGaussian  processes,  Stochastic  Processes  in  Underwater  Acoustics, 
Lecture  Notes  in  Control  and  Information  Science,  85,  154-180,  Springei — 
Verlag,  Berlin  (1986). 

[17]  C.R.  Baker  and  A.F.  Gual tierotti .  Discrimination  with  respect  to  a 
Gaussian  process,  Proba bil.  Theory  Rel.  Fields,  71,  159-182  (1986). 

[18]  H.  Cramer.  On  some  classes  of  non- s tat ionary  stochastic  processes. 

Fourth  Berkeley  Sympos.  Math.  Stat.  and  Probdbil . ,  2,  57-77,  Univ.  Calif. 
(1961). 

[19]  T.  Hida.  Canonical  representations  of  Gaussian  processes  and  their 
representations,  Mem.  Coll.  Science,  33A,  109-115,  Univ.  Kyoto  (1960). 

[20]  K.  ltd.  The  topological  support  of  Gaussian  measure  on  Hilbert  space, 
Nagoya  Math.  J.  38,  181-183. 

[21]  J.L.  Doob.  Stochastic  Processes,  Wiley,  New  York  (1950). 

[22]  K.R.  Par thasarathy ,  Probability  Measures  on  Metric  Spaces,  Academic 
press.  New  York  (1967). 

[23]  D.  Freedman.  Markov  Chains,  Springer,  Ber 1 in-Heidelberg-New  York  (1983). 

[24]  A.  Ephremides,  A  property  of  random  processes  with  unit  multiplicity.  J. 
Multiua.rla.te  Analysis.  7  ,  525-534  (1977). 

[25]  A.F.  Gualtierotti .  Some  remarks  on  spherical ly-invariant  distributions, 
J.  Multivariate  Anal.,  4,  347-349  (1974). 

[26]  A.F.  Gualtierotti,  A  likelihood  ratio  formula  for  spherical ly-invariant 
processes,  IEEE  Trans.  Inform.  Theory,  IT-22,  610  (1976). 

[27]  C.R.  Baker  and  A.F.  Gualtierotti.  Absolute  continuity  and  mutual 
information  for  Gaussian  mixtures,  LISS-28,  Department  of  Statistics, 
University  of  North  Carolina,  Chapel  Hill  (1988). 

[28]  J.  Jacod.  Calcui  Stochastique  el  Prob lemes  de  Martingales,  Lecture  Notes 
in  Mathematics,  714,  Springer,  Berlin  (1979). 


Book  -  2/14/90  -  49 


